Maintaining Consistency Within A Federation Infrastructure

ABSTRACT

Maintaining ring consistency when a node leaves a ring includes a first adjacent node to a leaving node receiving an indication, from the leaving node, indicating its intent to leave the ring. The first adjacent node sends a first indication to a second adjacent node indicating i) acceptance of the leaving node&#39;s intent to leave id-space ownership for a portion of the id-space between the leaving node and the first adjacent node, and ii) establishment of a one-way monitoring relationship between the first adjacent node and the second adjacent node. The first adjacent node receives a second indication, from the second adjacent node, indicating i) acceptance of the first adjacent node&#39;s intent to assume id-space ownership for the portion of the id-space between the leaving node and the first adjacent node, and ii) establishment of the one-way monitoring relationship between the second adjacent node and the first adjacent node.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of U.S. patent applicationSer. No. 14/681,620, filed Apr. 8, 2015, entitled “MAINTAININGCONSISTENCY WITHIN A FEDERATION INFRASTRUCTURE”, which is a continuationof U.S. patent application Ser. No. 12/907,799, filed Oct. 19, 2010,entitled “MAINTAINING CONSISTENCY WITHIN A FEDERATION INFRASTRUCTURE”,which is a continuation-in-part of U.S. patent application Ser. No.11/936,589, filed Nov. 7, 2007, entitled “MAINTAINING CONSISTENCY WITHINA FEDERATION INFRASTRUCTURE”, which claims the benefit of and priorityto U.S. provisional patent application Ser. No. 60/865,136, filed Nov.9, 2006, entitled “P2P RING OF STORAGE” and is also acontinuation-in-part of U.S. patent application Ser. No. 10/971,451,filed Oct. 22, 2004, entitled “RENDEZVOUSING RESOURCE REQUESTS WITHCORRESPONDING RESOURCES”. All of the above applications are incorporatedherein by reference in their entireties.

BACKGROUND

1. Background and Relevant Art

Computer systems and related technology affect many aspects of society.Indeed, the computer system's ability to process information hastransformed the way we live and work. Computer systems now commonlyperform a host of tasks (e.g., word processing, scheduling, and databasemanagement) that prior to the advent of the computer system wereperformed manually. More recently, computer systems have been coupled toone another and to other electronic devices to form both wired andwireless computer networks over which the computer systems and otherelectronic devices can transfer electronic data. As a result, many tasksperformed at a computer system (e.g., voice communication, accessingelectronic mail, controlling home electronics, Web browsing, andprinting documents) include the exchange of electronic messages betweena number of computer systems and and/or other electronic devices viawired and/or wireless computer networks.

However, to utilize a network resource to perform a computerized task, acomputer system must have some way to identify and access the networkresource. Accordingly, resources are typically assigned uniqueidentifiers, for example, network addresses, that uniquely identifyresources and can be used to distinguish one resource from otherresources. Thus, a computer system that desires to utilize a resourcecan connect to the resource using the network address that correspondsto the resource. However, accessing a network resource can be difficultif a computer system has no prior knowledge of a network address for anetwork resource. For example, a computer system can not print adocument at a network printer unless the computer system (or anothernetworked computer system) knows the network address of the networkprinter.

Accordingly, various mechanisms (e.g., Domain Name System (“DNS”),Active Directory (“AD”), Distributed File Systems (“DFS”)) have beendeveloped for computer systems to identify (and access) previous unknownresources. However, due to the quantity and diversity of resources(e.g., devices and services) that are accessible via different computernetworks, developers are often required to develop applications thatimplement a variety of different resource identification and accessmechanisms. Each different mechanism may have different codingrequirements and may not provide a developer with all the functionalitythat is needed in an application.

For example, although DNS has a distributed administration architecture(i.e., centralized management is not required), DNS is not sufficientlydynamic, not self-organizing, supports a weak data and query model, andhas a fixed set of roots. On the other hand, AD is sufficiently dynamicbut requires centralized administration. Further, aspects of differentmechanisms may not be compatible with one another. For example, aresource identified using DNS may not be compatible with DFS routingprotocols. Thus, a developer may be forced to choose the most suitablemechanism and forgo the advantages of other mechanisms.

Mechanisms for identifying resources can be particularly problematic inpeer-to-peer networks. DNS provides a lookup service, with host names askeys and IP addresses as values, that relies on a set of special rootservers to implement lookup requests. Further, DNS requires managementof information (NS records) for allowing clients to navigate the nameserver hierarchy. Thus, a resource must be entered into DNS before theresource can be identified on a network. On larger scale networks wherenodes frequently connect and disconnect form the network relying onentry of information is not always practical. Additionally, DNS isspecialized to the task of find hosts or services and is not generallyapplicable to other types of resources.

Accordingly, other mechanisms for resource identification and accesshave been developed to attempt to address these shortcomings. A numberof mechanisms include distributed lookup protocols that are morescalable than DNS. These mechanisms use various node arrangements androuting algorithms to route requests to corresponding resources and tostore information for lookup.

At least one of these mechanisms utilizes local multi-level neighbormaps at each node in a network to route messages to a destination node.This essentially results in an architecture where each node is a “rootnode” of a corresponding tree of nodes (the nodes in its neighbor map).Messages are incrementally routed to a destination ID digit by digit(e.g., ***6=>**46=>, *346=>2346, where *s represent wildcards). Therouting efficiency of these types of mechanisms is O(log N) routing hopsand require nodes to maintain a routing table of O(log N) size.

At least one other of these mechanisms assigns nodes a unique ID that istaken from a linear ring of numbers. Nodes maintain routing tables thatcontain pointers to their immediate successor node (according to IDvalue) and to those nodes whose ID values are the closest successor ofthe value ID+2^(L). The routing efficiency of these types of mechanismsis also O(log N) routing hops and require nodes to maintain a routingtable of O(log N) size.

At least one further mechanisms requires O(log N^(1/d)) routing hops andrequires nodes to maintain a routing table of O(D) size. Thus, therouting efficiency of all of these mechanisms depends, at least in part,on the number of nodes in the system.

Further, since IDs (for at least some of the mechanisms) can beuniformly distributed around a ring, there is always some possibilitythat routing between nodes on the ring will result in some inefficiency.For example, routing hops can cross vast geographic distances, crossmore expensive links, or pass through insecure domains, etc.Additionally, when message routing involves multiple hops, there is somechance that such events will occur multiple times. Unfortunately, thesemechanisms do not take into account the proximity of nodes (physical orotherwise) with respect one another. For example, depending on nodedistribution on a ring, routing a message from New York to Boston couldinvolve routing the message from New York, to London, to Atlanta, toTokyo, and then to Boston.

Accordingly, at least one other more recent mechanism takes proximityinto account by defining proximity as a single scalar proximity metric(e.g., IP routing hops or geographic distance). These mechanisms use thenotion of proximity-based choice of routing table entries. Since thereare many “correct” node candidates for each routing table entry, thesemechanisms attempt to select a proximally close node from among thecandidate nodes. For these mechanisms can provide a function that allowseach node to determine the “distance” of a node with a given IP addressto itself. Messages are routed between nodes in closer proximity to makeprogress towards a destination before routing to a node that is furtheraway. Thus, some resources can be conserved and routing is moreefficient.

Unfortunately, these existing mechanisms typically do not provide for,among other things, symmetric relationships between nodes (i.e., if afirst node considers a second node to be its partner, the second nodeconsiders the first node as a partner as well), routing messages in bothdirections (clockwise and counterclockwise) on a ring, partitioninglinked lists of nodes based on a plurality of proximity metrics, androuting messages based on a plurality of proximity metrics. Thesedeficiencies can limit dynamic, distributed, and efficient transfer ofdata between nodes of a network, such as, for example, when broadcastingdata to all nodes of the network.

In some environments, safety mechanisms are used to insure that noderesponsibilities do not inappropriately overlap. For example, a safetymechanism can be used to prevent two different nodes from claimingresponsibly for a system resource (e.g., a message) or logical identitywithin that system. In some environments, liveness mechanisms are usedto insure that if a message is repeatedly sent to a target the messageis accepted. Unfortunately, many existing asynchronous systems provideonly limited safety and liveness mechanisms. For example, someasynchronous systems provide only eventually safety and liveness. Thus,these asynchronous systems are not suitable for various types ofapplications, such as, for example, authoritative storage.

BRIEF SUMMARY

The present application extends to methods, systems, and computerprogram products for maintaining ring consistency when the leaving nodeleaves the ring of nodes. In an embodiment, a first adjacent node to theleaving node receives an indication from the leaving node indicating theleaving node's intent to leave the ring of nodes. The first adjacentnodes sends a first indication to the second adjacent node thatindicates acceptance of the leaving node's intent to leave id-spaceownership for a portion of the id-space between the leaving node and thefirst adjacent node, and that indicates establishment of a one-waymonitoring relationship between the first adjacent node and the secondadjacent node. The first adjacent node also receives a second indicationfrom the second adjacent node that indicates acceptance of the firstadjacent node's intent to assume id-space ownership for the portion ofthe id-space between the leaving node and the first adjacent node, andthat indicates establishment of the one-way monitoring relationshipbetween the second adjacent node and the first adjacent node.

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used as an aid in determining the scope of the claimed subjectmatter.

Additional features and advantages will be set forth in the descriptionwhich follows, and in part will be obvious from the description, or maybe learned by the practice of the teachings herein. Features andadvantages of the invention may be realized and obtained by means of theinstruments and combinations particularly pointed out in the appendedclaims. Features of the present invention will become more fullyapparent from the following description and appended claims, or may belearned by the practice of the invention as set forth hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to describe the manner in which the above-recited and otheradvantages and features can be obtained, a more particular descriptionof the subject matter briefly described above will be rendered byreference to specific embodiments which are illustrated in the appendeddrawings. Understanding that these drawings depict only typicalembodiments and are not therefore to be considered to be limiting inscope, embodiments will be described and explained with additionalspecificity and detail through the use of the accompanying drawings inwhich:

FIG. 1 illustrates an example of a federation infrastructure.

FIG. 2 illustrates an example of a computer architecture thatfacilitates routing request indirectly to partners.

FIG. 3 illustrates an example binary relationship between nodes in afederation infrastructure in the form of a sorted list and correspondingring.

FIG. 4 illustrates an example ring of rings that facilitates proximalrouting.

FIG. 5 illustrates an example proximity induced partition tree of ringsthat facilitates proximal routing.

FIG. 5A illustrates the example proximity induced partition tree ofrings of FIG. 5 with additional detail in portions of the partition treeof rings of FIG. 5.

FIG. 6 illustrates a suitable operating environment for the principlesof the present invention.

FIG. 7 illustrates an example flow chart of a method for populating anode routing table that takes proximity criteria into account

FIG. 8 illustrates an example flow chart of a method for partitioningthe nodes of a federation infrastructure.

FIG. 9 illustrates an example flow chart of a method for populating anode routing table.

FIG. 10 illustrates an example flow chart of a method for numericallyrouting a message towards a destination node.

FIG. 11 illustrates an example flow chart of a method for proximallyrouting a message towards a destination node.

FIG. 12A illustrates an example of a node establishing membership withinan existing federation.

FIG. 12B illustrates an example of nodes in a federation infrastructureexchanging messages.

FIG. 13 illustrates an example flow chart of a method for establishingmembership within a federation infrastructure.

FIG. 14 illustrates an example flow chart of a method for maintainingmembership within a federation infrastructure.

FIG. 15 illustrates an example flow chart of a method for discoveringliveness information for another node.

FIG. 16 illustrates an example of a message model and related processingmodel.

FIG. 17 illustrates an example of a number of liveness interactions thatcan occur between a function layer and an application layer.

FIG. 18 illustrates an example of messages forming part of arequest-response message exchange pattern are routed across nodes on aring.

FIG. 19A illustrates an example ring architecture that facilitates onenode monitoring another (e.g., subject) node.

FIG. 19B illustrates an example ring architecture that facilitates twonodes monitoring each other.

FIG. 19C illustrates an example ring architecture that facilitatesarbitration when mutually monitoring nodes can each report that theother node is suspected of failing.

FIG. 20 illustrates an example flow chart of a method for one node tomonitor another node.

FIG. 21 illustrates an example flow chart of a method for arbitratingbetween conflicting reports of suspected node failures.

FIG. 22A illustrates an example ring architecture that facilitatesrouting a message in accordance with a cached two-way agreement.

FIG. 22B illustrates an example ring architecture that facilitatesrouting a message in accordance with multiple cached two-way agreements.

FIGS. 23A through 23D illustrate an example ring architecture thatfacilitates formulating a cached two-way agreement.

FIG. 24 illustrates an example flow chart of a method for routing amessage in accordance with a cached two-way agreement.

FIG. 25 illustrates an example flow chart of a method for routing amessage in accordance with multiple cached two-way agreements.

FIG. 26 illustrates an example flow chart of a method for joining atwo-way agreement.

FIG. 27 illustrates an example ring architecture that facilitatesjoining of a node to a ring of nodes within a federation.

FIG. 28 illustrates an example flow chart of methods for maintainingring consistency when a joining node joins the ring of nodes from theperspectives of the joining node, a selected immediately adjacent nodeand another immediately adjacent node.

FIG. 29 illustrates an example flow chart of a method for maintainingring consistency when a leaving node leaves the ring of nodes.

FIG. 30 illustrates an exemplary state diagram for a joining nodejoining the ring of nodes.

FIG. 31 illustrates an exemplary state diagram for a leaving nodeleaving the ring of nodes.

DETAILED DESCRIPTION

The present invention extends to methods, systems, and computer programproducts for allocating and reclaiming resources within a rendezvousfederation. In some embodiments, message is routed towards a destinationnode. A receiving node received a message along with a destinationidentifier indicating a destination on the ring of nodes. Thedestination identifier located between the receiving node and animmediate neighborhood node. The immediate neighborhood node is selectedfrom among an immediate predecessor neighbor node and an immediatesuccessor neighborhood node.

The receiving node refers to a cached two-way agreement between thereceiving node and the immediate neighbor node to determine the nextappropriate node that is to receive the message. The cached two-wayagreement at least implies a division of responsibility for theidentifier space between the receiving node and an immediate neighbornode. The receiving node sends the message to the next appropriatecomponent based on the determination of the next appropriate node.

In other embodiments, a two-way agreement between a current node and animmediate neighborhood node is formulated. The current node accessing anindication that the configuration of the ring of nodes has changed. Theindication is indicative of a change in at least a current immediateneighbor node. The current immediate neighbor node selected from among acurrent immediate predecessor node and a current immediate successornode. The change results in a new immediate neighbor node.

The indication a further indication of a need to formulate a two-wayagreement dividing responsibility for at least unoccupied identifiers onthe ring between the current node and the new immediate neighbor node.The current node and the new immediate neighbor node agreeing to aresponsibility boundary between the current node and new immediateneighbor node. The responsibility boundary divides responsibility forthe unoccupied identifiers between the current node and the newimmediate neighbor node. Unoccupied identifiers between the current nodeand the responsibility boundary become the responsibility of the currentnode. Likewise, unoccupied identifiers between the responsibilityboundary and the new immediate neighbor node being the responsibility ofthe new immediate neighbor node.

Embodiments of the present invention may comprise or utilize a specialpurpose or general-purpose computer including computer hardware, asdiscussed in greater detail below. Embodiments within the scope of thepresent invention also include physical and other computer-readablemedia for carrying or storing computer-executable instructions and/ordata structures. Such computer-readable media can be any available mediathat can be accessed by a general purpose or special purpose computersystem. Computer-readable media that store computer-executableinstructions are physical storage media. Computer-readable media thatcarry computer-executable instructions are transmission media. Thus, byway of example, and not limitation, embodiments of the invention cancomprise at least two distinctly different kinds of computer-readablemedia: physical storage media and transmission media.

Physical storage media includes RAM, ROM, EEPROM, CD-ROM or otheroptical disk storage, magnetic disk storage or other magnetic storagedevices, or any other medium which can be used to store desired programcode means in the form of computer-executable instructions or datastructures and which can be accessed by a general purpose or specialpurpose computer.

A “network” is defined as one or more data links that enable thetransport of electronic data between computer systems and/or modulesand/or other electronic devices. When information is transferred orprovided over a network or another communications connection (eitherhardwired, wireless, or a combination of hardwired or wireless) to acomputer, the computer properly views the connection as a transmissionmedium. Transmission media can include a network and/or data links whichcan be used to carry or transport desired program code means in the formof computer-executable instructions or data structures and which can beaccessed by a general purpose or special purpose computer. Combinationsof the above should also be included within the scope ofcomputer-readable media.

However, it should be understood, that upon reaching various computersystem components, program code means in the form of computer-executableinstructions or data structures can be transferred automatically fromtransmission media to physical storage media. For example,computer-executable instructions or data structures received over anetwork or data link can be buffered in RAM within a network interfacecard, and then eventually transferred to computer system RAM and/or toless volatile physical storage media at a computer system. Thus, itshould be understood that physical storage media can be included incomputer system components that also (or even primarily) utilizetransmission media.

Computer-executable instructions comprise, for example, instructions anddata which cause a general purpose computer, special purpose computer,or special purpose processing device to perform a certain function orgroup of functions. The computer executable instructions may be, forexample, binaries, intermediate format instructions such as assemblylanguage, or even source code. Although the subject matter has beendescribed in language specific to structural features and/ormethodological acts, it is to be understood that the subject matterdefined in the appended claims is not necessarily limited to thedescribed features or acts described above. Rather, the describedfeatures and acts are disclosed as example forms of implementing theclaims.

In some embodiments, hardware modules, such as, for example, specialpurpose integrated circuits or Gate-arrays are optimized to implementthe principles of the present invention.

In this description and in the following claims, a “node” is defined asone or more software modules, one or more hardware modules, orcombinations thereof, that work together to perform operations onelectronic data. For example, the definition of a node includes thehardware components of a personal computer, as well as software modules,such as the operating system of the personal computer. The physicallayout of the modules is not important. A node can include one or morecomputers coupled via a network. Likewise, a node can include a singlephysical device (such as a mobile phone or Personal Digital Assistant“PDA”) where internal modules (such as a memory and processor) worktogether to perform operations on electronic data. Further, a node caninclude special purpose hardware, such as, for example, a router thatincludes special purpose integrated circuits.

Those skilled in the art will appreciate that the invention may bepracticed in network computing environments with many types of nodeconfigurations, including, personal computers, laptop computers,hand-held devices, multi-processor systems, microprocessor-based orprogrammable consumer electronics, network PCs, minicomputers, mainframecomputers, mobile telephones, PDAs, pagers, routers, gateways, brokers,proxies, firewalls, redirectors, network address translators, and thelike. The invention may also be practiced in distributed systemenvironments where local and remote nodes, which are linked (either byhardwired data links, wireless data links, or by a combination ofhardwired and wireless data links) through a network, both performtasks. In a distributed system environment, program modules may belocated in both local and remote memory storage devices.

Federation Architecture

FIG. 1 illustrates an example of a federation infrastructure 100. Thefederation infrastructure 100 includes nodes 101, 102, 103, that canform different types of federating partnerships. For example, nodes 101,102, 103 can be federated among one another as peers without a rootnode. Each of nodes 101, 102, and 103 has a corresponding ID 171, 182,and 193 respectively.

Generally, the nodes 101, 102, 103, can utilize federation protocols toform partnerships and exchange information (e.g., state informationrelated to interactions with other nodes). The formation of partnershipsand exchange of information facilitates more efficient and reliableaccess to resources. Other intermediary nodes (not shown) can existbetween nodes 101, 102, and 103 (e.g., nodes having IDs between 171 and193). Thus, a message routed, for example, between node 101 and node103, can be pass through one or more of the other intermediary nodes.

Nodes in federation infrastructure 100 (including other intermediarynodes) can include corresponding rendezvous protocol stacks. Forexample, nodes 101, 102, and 103 include corresponding rendezvousprotocol stacks 141, 142, and 143 respectively. Each of the protocolsstacks 141, 142, and 143 includes an application layer (e.g.,application layers 121, 122, and 123) and other lower layers (e.g.,corresponding other lower layers 131, 132, and 133). Each layer in arendezvous protocol stack is responsible for different functionalityrelated to rendezvousing a resource request with a correspondingresource.

For example, other lower layers can include a channel layer, a routinglayer, and a function layer. Generally, a channel layer is responsiblefor reliably transporting a message (e.g., using WS-ReliableMessagingand Simple Object Access Protocol (“SOAP”)) from one endpoint to another(e.g., from node 101 to node 103). The channel layer is also responsiblefor processing incoming and outgoing reliable messaging headers andmaintaining state related to reliable messaging sessions.

Generally, a routing layer is responsible for computing the next hoptowards a destination. The routing layer is also responsible forprocessing incoming and outgoing addressing and routing message headersand maintaining routing state. Generally, a function layer isresponsible for issuing and processing rendezvous protocol messages suchas join and depart requests, pings, updates, and other messages, as wellas generation of responses to these messages. The function layerprocesses request messages from the routing layer and sends backcorresponding response messages, if any, to the originating node usingthe routing layer. The function layer also initiates request messagesand utilizes the routing layer to have the requests messages delivered.

Generally, an application layer processes non-rendezvous protocolspecific data delivered from the function layer (i.e., applicationmessages). The function layer can access application data from theapplication layer and get and put application data in rendezvousprotocol messages (e.g., pings and updates). That is, the function layercan cause application data to be piggybacked on rendezvous protocolmessages and can cause the application data to be passed back to theapplication layer in receiving rendezvous protocol nodes. In someembodiments, application data is used to identify resources and resourceinterests. Thus, an application layer can include application specificlogic and state that processes data received from and sent to the otherlower layers for purposes of identifying resources and resourceinterests.

Federating Mechanisms

Nodes can federate using a variety of different mechanisms. A firstfederating mechanism includes peer nodes forwarding information to allother peer nodes. When a node is to join a federation infrastructure,the node utilizes a broadcast/multicast discovery protocol, such as, forexample, WS-Discovery to announce its presence and issues abroadcast/multicast find to detect other nodes. The node thenestablishes a simple forwarding partnership with other nodes alreadypresent on the network and accepts new partnerships with newly joiningnodes. Thereafter, the node simply forwards all application specificmessages to all of its partner nodes.

A second federating mechanism includes peer nodes that most efficientlytransmit application specific messages to their destination(s). When anew node is to join a federation infrastructure, the new node utilizes abroadcast/multicast discovery protocol, such as, for example,WS-Discovery to announce its presence and issues a broadcast/multicastfind to detect other nodes that are part of the federationinfrastructure. Upon detecting another node, the new node establishes apartnership with the other node. From the established partnership, thenew node learns about the presence of other nodes already participatingin federation infrastructure. It then establishes partnerships withthese newly-learned nodes and accepts any new incoming partnershiprequests.

Both node arrivals/departures and registrations of interest in certainapplication specific messages are flooded through the federationinfrastructure resulting in every node having global knowledge of otherpartner nodes and registrations of interest in application specificmessages. With such global knowledge, any node can send applicationspecific messages directly to the nodes that have expressed interest inthe application specific message.

A third federating mechanism includes peer nodes indirectly forwardingall application specific messages to their destination/s. In this thirdmechanism, nodes are assigned identifiers (ID's), such as, for example,a 128-bit or 160-bit ID. The node responsible for a maintainingregistration of interest in a given application specific message can bedetermined to be the one whose ID is closest to the one obtained bymapping (e.g., hashing) the destination identity (e.g. URI) of theapplication specific message to this 128-bit or 160-bit ID-space.

In this third mechanism, node arrivals and departures are flooded overthe entire fabric. On the other hand, registrations of interest incertain application specific messages are forwarded to the nodesdetermined to be responsible for maintaining such registrationinformation. For scalability, load balancing, and fault-tolerance, thenode receiving registration of interest in certain application specificmessages can reliably flood that registration information within itsneighborhood set. The neighborhood set for a specified node can bedetermined to be the set of nodes having IDs within a predefined rangeon either side of the ID of specified node.

Similar to the second mechanism, a newly-joining node utilizes abroadcast/multicast discovery protocol, such as, for example,WS-Discovery to announce its presence and issues a localbroadcast/multi-cast find to detect a node that is already part of thefederation infrastructure. The new node establishes a partnership withthe discovered node and uses that partnership to learn about thepresence of other new nodes participating in the federationinfrastructure. The new node then establishes further partnerships withthe newly discovered nodes and accepts any new incoming partnershiprequests. The new node accepts incoming registrations of interest incertain application layer specific resources from its partners for whichit is responsible and may flood them over its neighborhood set. Thus,messages can generally be forwarded to their final destination viaintermediary routing nodes (e.g., that a newly joining node haspartnered with or that a partner node is aware of).

In response to receiving an incoming application specific message, thenew node forwards the message to the partner node that may beresponsible for maintaining the registration information for thedestination specified in the message. Thus, when using this thirdmechanism, every node in the federation infrastructure has globalknowledge of all other nodes but the registration information isefficiently partitioned among the nodes. Application specific messagesare transmitted to their final destination via only the partner's nodesthat may have the responsibility for maintaining registrationinformation of interest in those application specific messages. Thus,indirection is accomplished by forwarding only to the partner node thathas global knowledge of the registration information of interest for themessage being processed. This is in contrast to the first mechanismwhere the indirection is accomplished by forwarding to all the partnernodes.

A fourth federating mechanism includes peer nodes that route messages toother peer nodes. This fourth mechanism differs from the third mechanismat least in that both node arrivals/departures and registrations ofinterest in certain application specific messages are all routed insteadbeing flooded. Routing protocols are designed to guarantee rendezvousbetween application specific messages and the registration messages thatexpress interest in those application specific messages.

FIG. 2 illustrates an example of a computer architecture 200 thatfacilitates routing requests indirectly to partners. Computerarchitecture 200 depicts different types of computer systems and devicespotentially spread across multiple local discovery scopes participatingin a federation infrastructure.

Workstation 233 can include a registered PnP provider instance. Toinform its partners of the presence of this PnP provider instance,workstation 233 routes registration request 201 over the federationinfrastructure. Registration request 201 is initially forwarded tolaptop 231, which in turn forwards registration request 201 to messagebroker 237, which in turn forwards registration request 201 to messagegateway 241. Message gateway 241 saves the registration informationregistration request 201 in its database and returns success message 204to workstation 233.

Subsequently, another registered provider instance, this time that ofrunning services, comes alive within the workstation 233. This time thenode is aware that message gateway 241 is responsible for registrationsand forwards registration request 205 to message gateway 241 directly.Message gateway 241 saves the registration information registrationrequest 205 in its database and returns success message 206 toworkstation 233.

Subsequently, the printer 236 (e.g., a UPnP printer) is powered on andsends announcement 207. Server 234 detects announcement 207 and routesregistration request 208 to message broker 237. Message broker 237forwards registration request 208 to message gateway 241. Messagegateway 241 saves the registration information registration request 208in its database and returns success message 210 to server 234.

Subsequently, personal computer 242 issues lookup request 211 todiscover all devices. Since personal computer 242 doesn't know where toforward lookup request 211, it routes lookup request 211 throughworkstation 243. As registration and lookup requests are routed to thesame destination, the routing protocol essentially guarantees rendezvousbetween the two requests resulting in workstation 243 forwards findrequest 211 to message gateway 241. Message gateway 241 looks up theregistration information maintained by it and forwards find request 211to both the workstation 233 and server 234. Workstation 233 and server234 send response messages 214 and 216 respectively to personal computer242.

This fourth mechanism works by routing (instead of flooding) a requestto the node (message gateway 241) that has global knowledge of theregistrations specified in a request. This fourth mechanism, as will bedescribed in further detail below, essentially guarantees that routingcan be accomplished in O(log N) hops, where N is the number of nodesparticipating in the federation infrastructure. Since this fourthmechanism efficiently partitions both node partnership and registrationinformation, it scales to very large networks, even the Internet.

Although a number of federating mechanisms have been described, it wouldbe apparent to one skilled in the art, after having reviewed thisdescription, that other federation mechanisms are possible.

Relationship Between Nodes In A Federation

Accordingly, a federation consists of a set of nodes that cooperateamong themselves to form a dynamic and scalable network in whichinformation can be systematically and efficiently disseminated andlocated. Nodes are organized to participate in a federation as a sortedlist using a binary relation that is reflexive, anti-symmetric,transitive, total, and defined over the domain of node identities. Bothends of the sorted list are joined, thereby forming a ring. Thus, eachnode in the list can view itself as being at the middle of the sortedlist (as a result of using modulo arithmetic). Further, the list isdoubly linked so that any node can traverse the list in eitherdirection.

Each federating node can be assigned an ID (e.g., by a random numbergenerator with duplicate detection) from a fixed set of IDs between 0and some fixed upper bound. Thus, adding 1 to an ID of the fixed upperbound results in an ID of zero (i.e., moving from the end of the linkedlist back to the beginning of the linked listed. In addition, a 1:1mapping function from the value domain of the node identities to thenodes themselves is defined.

FIG. 3 depicts an example linked list 304 and corresponding ring 306.Given such a ring, the following functions can be defined:

-   -   RouteNumerically(V, Msg): Given a value V from the value domain        of node identities and a message “Msg,” deliver the message to        node X whose identity can be mapped to V using the mapping        function.    -   Neighborhood(X, S): Neighborhood is the set of nodes on the        either side of node X with cardinality equal to S.

When every node in the federation has global knowledge of the ring,RouteNumerically(V, Msg) is implemented by directly sending Msg to thenode X, whose identity is obtained by applying the mapping function toV. Alternately, when nodes have limited knowledge of other nodes (e.g.,only of immediately adjacent nodes), RouteNumerically(V, Msg) isimplemented by forwarding the message to consecutive nodes along thering until it reaches the destination node X.

Alternately (and advantageously), nodes can store enough knowledge aboutthe ring to perform a distributed binary search (without having to haveglobal knowledge or implement routing between immediately adjacentnodes). The amount of ring knowledge is configurable such thatmaintaining the ring knowledge has a sufficiently small impact on eachnode but allows increased routing performance from the reduction in thenumber of routing hops.

As previously described, IDs can be assigned using the “<” (less than)relation defined over a sufficiently large, bounded set of naturalnumbers, meaning its range is over a finite set of numbers between 0 andsome fixed value, inclusive. Thus, every node participating in thefederation is assigned a natural number that lies between 0 and someappropriately-chosen upper bound, inclusive. The range does not have tobe tight and there can be gaps between numbers assigned to nodes. Thenumber assigned to a node serves as its identity in the ring. Themapping function accounts for gaps in the number space by mapping anumber falling in between two node identities to the node whose identityis numerically closest to the number.

This approach has a number of advantages. By assigning each node auniformly-distributed number, there is an increased likelihood that allsegments of the ring are uniformly populated. Further, successor,predecessor, and neighborhood computations can be done efficiently usingmodulo arithmetic.

In some embodiments, federating nodes are assigned an ID from within anID space so large that the chances of two nodes being assigned the sameID are highly unlikely (e.g., when random number generation is used).For example, a node can be assigned an ID in the range of 0 to b^(n)-1,where b equals, for example, 8 or 16 and n equals, for example, 128-bitor 160-bit equivalent digits. Accordingly, a node can be assigned an ID,for example, from a range of 0 to 16⁴⁰-1 (or approximately 1.461502E48).The range of 0 to 16⁴⁰-1 would provide, for example, a sufficient numberof IDs to assign every node on the Internet a unique ID.

Thus, each node in a federation can have:

-   -   An ID which is a numerical value uniformly distributed in the        range of 0 to b^(n)-1; and

A routing table consisting of (all arithmetic is done modulo b^(n)):

Successor node (s); Predecessor node (p); Neighborhood nodes (p_(k), . .. , p_(l), p, s, s_(l), . . . , s_(j)) such that s_(j).s.id > (id +u/2), j ≧ v/2-1, and p_(k).p.id < (id − u/2), and k ≧ v/2-1; and Routingnodes (r_(-(n-i)), . . . , r_(-l), r_(l), . . . , r_(n-1)) such thatr_(±i) = RouteNumerically(id ± b^(i), Msg).where b is the number base, n is the field size in number of digits, uis the neighborhood range, v is the neighborhood size, and thearithmetic is performed modulo b^(n). For good routing efficiency andfault tolerance, values for u and v can be u=b and v≧max(log₂(N), 4),where N is the total number of nodes physically participating in thefederation. N can be estimated from the number of nodes present on aring segment whose length is greater than or equal to b, for example,when there is a uniform distribution of IDs. Typical values for b and nare b=8 or 16 and n=128-bit or 160-bit equivalent digits.

Accordingly, routing nodes can form a logarithmic index spanning a ring.Depending on the locations of nodes on a ring, a precise logarithmicindex is possible, for example, when there is an existing node at eachnumber in the set of id±b^(i) where i=(1, 2, . . . (n-1)). However, itmay be that there are not existing nodes at each number in the set. INthose cases, a node closest to id±b^(i) can be selected as a routingnode. The resulting logarithmic index is not precise and may even lackunique routing nodes for some numbers in the set.

Referring again to FIG. 3, FIG. 3 illustrates an example of a binaryrelation between nodes in a federation infrastructure in the form ofsorted list 304 and corresponding ring 306. The ID space of sorted list304 is in the range 0 to 2⁸-1 (or 255). That is, b=2 and n=8. Thus,nodes depicted in FIG. 3 are assigned IDs in a range from 0 to 255.Sorted list 304 utilizes a binary relation that is reflexive,anti-symmetric, transitive, total, and defined over the domain of nodeidentities. Both ends of sorted list 304 are joined, thereby formingring 306. This makes it possible for each node in FIG. 3 to view itselfas being at the middle of sorted list 304. The sorted list 304 is doublylinked so that any node can traverse the sorted list 304 in eitherdirection. Arithmetic for traversing sorted list 304 (or ring 306) isperformed modulo 2⁸. Thus, 255 (or the end of sorted list 304) +1=0 (orthe beginning of sorted list 304).

The routing table indicates that the successor to ID 64 is ID 76 (the IDimmediately clockwise from ID 64). The successor can change, forexample, when a new node (e.g., with an ID of 71) joins or an existingnode (e.g., ID 76) leaves the federation infrastructure. Likewise, therouting table indicates that the predecessor to ID 64 is ID 50 (the IDimmediately counters clockwise from ID 64). The predecessor can change,for example, when a new node (e.g., with an ID of 59) joins or anexisting node (e.g., ID 50) leaves the federation infrastructure.

The routing table further indicates that a set of neighborhood nodes toID 64 have IDs 83, 76, 50 and 46. A set of neighbor nodes can be aspecified number of nodes (i.e., neighborhood size v) that are within aspecified range (i.e., neighbor range u) of ID 64. A variety ofdifferent neighborhood sizes and neighbor ranges, such as, for example,V=4 and U=10, can potentially be used to identify the set ofneighborhood nodes. A neighborhood set can change, for example, whennodes join or leave the federation infrastructure or when the specifiednumber of nodes or specified range is changed.

The routing table further indicates that ID 64 can route to nodes havingIDs 200, 2, 30, 46, 50, 64, 64, 64, 64, 76, 83, 98, 135, and 200. Thislist is generated by identifying the node closest to each number in theset of id±2¹ where i=(1, 2, 3, 4, 5, 6, 7). That is, b=2 and n=8. Forexample, the node having ID 76 can be identified from calculating theclosest node to 64+2³, or 72.

A node can route messages (e.g., requests for access to resources)directly to a predecessor node, a successor node, any node in a set ofneighborhood nodes, or any routing node. In some embodiments, nodesimplement a numeric routing function to route messages. Thus,RouteNumerically(V, Msg) can be implemented at node X to deliver Msg tothe node Y in the federation whose ID is numerically closest to V, andreturn node Y′s ID to node X. For example, the node having ID 64 canimplement RouteNumerically(243, Msg) to cause a message to be routed tothe node having ID 250. However, since ID 250 is not a routing node forID 64, ID 64 can route the message to ID 2 (the closest routing node to243). The node having ID 2 can in turn implement RouteNumerically(243,Msg) to cause the message to be routed (directly or through furtherintermediary nodes) to the node having ID 250. Thus, it may be that aRouteNumerically function is recursively invoked with each invocationrouting a message closer to the destination.

Proximity

Advantageously, other embodiments of the present invention facilitatepartitioning a ring into a ring of rings or tree of rings based on aplurality of proximity criteria of one or more proximity categories(e.g., geographical boundaries, routing characteristics (e.g., IProuting hops), administrative domains, organizational boundaries, etc.).It should be understood a ring can be partitioned more than once usingthe same type of proximity criteria. For example, a ring can bepartition based on a continent proximity criteria and a countryproximity criteria (both of a geographical boundaries proximitycategory).

Since IDs can be uniformly distributed across an ID space (a result ofrandom number generation) there is a high probability that any givensegment of a circular ID space contains nodes that belong to differentproximity classes provided those classes have approximately the samecardinality. The probability increases further when there are asufficient number of nodes to obtain meaningful statistical behavior.

Thus, neighborhood nodes of any given node are typically well dispersedfrom the proximality point of view. Since published application statecan be replicated among neighborhood nodes, the published informationcan be well dispersed as well from the proximality point of view.

FIG. 4 illustrates a ring of rings 400 that facilitates proximalrouting. Ring 401 can be viewed as a master or root ring, and containsall the nodes in each of the rings 402, 403, and 404. Each of the rings402, 403, and 404 contain a subset of nodes from ring 401 that arepartitioned based on a specified proximity criterion. For example, ring401 may be partitioned based on geographic location, where ring 402contains nodes in North America, ring 403 contains nodes in Europe, andring 404 contains nodes in Asia.

In a numerical space containing 65,536 (2¹⁶) IDs, routing a message froma North American node having an ID 5,345 to an Asian node having an ID23,345 can include routing the message within ring 402 until a neighbornode of the Asian node is identified. The neighbor node can then routethe message to the Asian node. Thus, a single hop (as opposed tomultiple hops) is made between a North American node and an Asian node.Accordingly, routing is performed in a resource efficient manner.

FIG. 5 illustrates an example proximity induced partition tree of rings500 that facilitates proximal routing. As depicted, partition tree ofrings 500 includes a number of rings. Each of the rings represents apartition of a sorted linked list. Each ring including a plurality anodes having IDs in the sorted linked list. However for clarity due tothe number of potential nodes, the nodes are not expressly depicted onthe rings (e.g., the ID space of partition tree 500 may be b=16 andn=40).

Within partition tree 500, root ring 501 is partitioned into a pluralityof sub-rings, including sub-rings 511, 512, 513, and 514, based oncriterion 571 (a first administrative domain boundary criterion). Forexample, each component of a DNS name can be considered a proximitycriterion with the partial order among them induced per their order ofappearance in the DNS name read right to left. Accordingly, sub-ring 511can be further partitioned into a plurality of sub-rings, includingsub-rings 521, 522, and 523, based on criterion 581 (a secondadministrative domain boundary criterion).

Sub-ring 522 can be further partitioned into a plurality of sub-rings,including sub-rings 531, 532, and 533, based on criterion 572 (ageographic boundary criterion). Location based proximity criterion canbe partially ordered along the lines of continents, countries, postalzip codes, and so on. Postal zip codes are themselves hierarchicallyorganized meaning that they can be seen as further inducing a partiallyordered sub-list of proximity criteria.

Sub-ring 531 can be further partitioned into a plurality of sub-rings,including sub-rings 541, 542, 543, and 544, based on criterion 573 (afirst organizational boundary criterion). A partially ordered list ofproximity criterion can be induced along the lines of how a givencompany is organizationally structured such as divisions, departments,and product groups. Accordingly, sub-ring 543 can be further partitionedinto a plurality of sub-rings, including sub-rings 551 and 552, based oncriterion 583 (a second organizational boundary criterion).

Within partition tree 500, each node has a single ID and participates inrings along a corresponding partition path starting from the root to aleaf. For example, each node participating in sub-ring 552 would alsoparticipate in sub-rings 543, 531, 522, 511 and in root 501. Routing toa destination node (ID) can be accomplished by implementing aRouteProximally function, as follows:

-   -   RouteProximally(V, Msg, P): Given a value V from the domain of        node identities and a message “Msg,” deliver the message to the        node Y whose identity can be mapped to V among the nodes        considered equivalent by the proximity criteria P.

Thus, routing can be accomplished by progressively moving closer to thedestination node within a given ring until no further progress can bemade by routing within that ring as determined from the condition thatthe destination node lies between the current node and its successor orpredecessor node. At this point, the current node starts routing via itspartner nodes in the next larger ring in which it participates. Thisprocess of progressively moving towards the destination node by climbingalong the partitioning path towards the root ring terminates when theclosest node to the destination node is reached within the requestedproximal context, as originally specified in the RouteProximallyinvocation.

Routing hops can remain in the proximal neighborhood of the node thatoriginated the request until no further progress can be made within thatneighborhood because the destination node exists outside it. At thispoint, the proximity criterion is relaxed to increase the size of theproximal neighborhood to make further progress. This process is repeateduntil the proximal neighborhood is sufficiently expanded to include thedestination node (ID). The routing hop made after each successiverelaxation of proximal neighborhood criterion can be a potentiallylarger jump in proximal space while making a correspondingly smallerjump in the numerical space compared to the previous hop. Thus, only theabsolutely required number of such (inter-ring) hops is made before thedestination is reached.

It may be the case that some hops are avoided for lookup messages sincepublished application data gets replicated down the partition tree whenit is replicated among the neighborhood nodes of the destination node.

To accomplish proximal routing, each federation node maintainsreferences to its successor and predecessor nodes in all the rings itparticipates as a member (similar to successor and predecessor for asingle ring)—the proximal predecessor, proximal successor, and proximalneighborhood. In order to make the routing efficient, the nodes can alsomaintain reference to other nodes closest to an exponentially increasingdistance on its either half of the ring as routing partners (similar torouting nodes for a single ring). In some embodiments, routing partnernodes that lie between a pair of consecutive successor or predecessornodes participate in the same lowest ring shared by the current node andthe node numerically closest to it among the successor or predecessornode pairs respectively. Thus, routing hops towards a destination nodetransition into using a relaxed proximity criterion (i.e., transitioningto a higher ring) only when absolutely needed to make further progress.Accordingly, messages can be efficiently rendezvoused with acorresponding federation node.

In some embodiments, nodes implement a proximal routing function toroute messages based on equivalence criteria relations. Thus, given anumber V and a message “Msg”, a node can implement RouteProximally(V,Msg, P) to deliver the message to the node Y whose identify can bemapped to V among the nodes considered equivalent by proximity criterionP. The proximity criterion P identifies the lowest ring in the partitiontree that is the common ancestor to all the nodes considered proximallyequivalent by it. It can be represented as a string obtained byconcatenating the proximity criterion found along the path from the rootring to the ring identified by it separated by the path separatorcharacter ‘/’. For example, the proximity criterion identifying sub-ring542 can be represented as “Proximity:/.COM/Corp2/LocationA/Div2”. Eachring in the partition tree 500 can be assigned a unique number, forexample, by hashing its representational string with a SHA basedalgorithm. If the number 0 is reserved for the root ring, it can beinferred that RouteNumerically(V, Msg)=RouteProximally(V, Msg, 0).

For example, a node in sub-ring 544 can implement RouteProximally toidentify a closer node in sub-ring 531 (e.g., to a node in sub-ring513). In turn, sub-ring 531 can implement RouteProximally to identify acloser node in sub-ring 522. Likewise, sub-ring 522 can implementRouteProximally to identify a closer node in sub-ring 511. Similarly,sub-ring 511 can implement RouteProximally to identify a closer node inring 501. Thus, it may be that a RouteProximally function is recursivelyinvoked with each invocation routing a message closer to thedestination.

Thus, when proximity criterion is taken into account, routing hops on apath to a final destination can remain within the proximity of a nodethat originates a request, while making significant progress between theoriginating node and the destination node in a numerical space, untileither the destination node is reached or no further progress can bemade under the chosen proximity criterion at which point it is relaxedjust enough to make further progress towards the destination. Forexample, proximity criterion can be relaxed enough for a message to berouted from ring 531 up to ring 522, etc.

Utilizing the above approach to proximity, it is possible to confinepublished information to a given ring. For example, organizations maylike to ensure that organization specific information is not availableto entities outside of their trust domains either (1) implicitly in theform of neighborhood replication to nodes outside of their domains or(2) explicitly in the form of servicing lookup requests for suchinformation. The first aspect is satisfied by replicating publishedinformation only among the nodes neighboring the target ID within thespecified ring. Because all messages originated by a node are routed bysuccessively climbing the rings to which it belongs towards the rootring, there is a high likelihood that all lookup requests originatedwithin an organization will be able to locate the published informationconfined to it thereby implicitly satisfying the second aspect.

Also, organizations dislike nodes automatically federating with nodesoutside of their trust domain. This can happen, for example, when avisiting sales person connects his/her laptop computer to the network inthe customer premises. Ideally, the laptop computer belonging to thesales person wishes to locate information published in its home domainand/or federate with the nodes in its home domain starting at its lowestpreferred proximity ring. It will typically not be permitted to federatewith the nodes in the customer's domain. Supporting this scenariorequires ability to locate seed nodes in the home domain. Such seednodes can be used for locating information published in the home domain,to join the home federation, to selectively import and export publishedinformation across domains, and as one possible way to arbitrateconflicting failure reports submitted by other nodes. Seed nodes arealso sometimes referred as message gateways.

In other embodiments, an entity publishes references to seed nodes inthe root ring. Seed nodes can be published at the unique number (such asthe one obtained by hashing its representational string) associated withthe ring (as a target ID). Seed node information can further beon-demand cached by the nodes in various rings that are on the path tothe corresponding target IDs in the root ring. Such on-demand cachingprovides for improved performance and reduction in hotspots that mightoccur when semi-static information is looked up quite frequently. Seednode information can also be obtained via other means such as DNS

To provide fault tolerance for confined published information, each nodecan maintain a set of neighborhood nodes in all of the rings itparticipates in. Given the above, the state maintained by a node can besummarized as follows:

-   -   An ID which is a numerical value uniformly distributed in the        range of 0 to b^(n)-1.    -   A routing table consisting of (all arithmetic is done modulo        b^(n)):        -   For each ring, say ring d, in which the node participates            -   Successor node (s_(d))            -   Predecessor node (p_(d))            -   Neighborhood nodes (p_(kd), . . . , p_(1d), p_(d),                s_(d), s_(1d), . . . , s_(jd)) such that                s_(jd).s_(jd).id>(id+u/2), j≧v/2-1,                p_(kd).p_(d).id<(id-u/2), and k≧v/2-1.        -   Routing nodes (r_(−(n−1)), . . . , r⁻¹, r₁, r_(n-1)) such            that r_(±i)=RouteProximally(id±b^(i), updateMsg, d) such            that s_(d)≦id+b^(i)≦s_(d+1) or p_(d+i)≦id-b^(i)≦p_(d) as            appropriate.    -   where b is the number base, n is the field size in number of        digits, u is the neighborhood range, and v is the neighborhood        size.

Note that a subset of the neighborhood nodes maintained by a given nodein ring “d” can appear again as neighborhood nodes in the child ring“d+1” in which the given node participates as well. As such one canderive the upper bound on the total number of neighborhood nodesmaintained by a given node across all the D rings it participates asD*max(u,v)/2. This considers that only one reference to a given node iskept and the worst case upper bound is for a balanced tree.

It should be noted that when a ring is partitioned into a plurality ofcorresponding sibling sub-rings, it is permitted for a specified node tosimultaneously participate in more than one of the plurality ofcorresponding sibling sub-rings, for example, through aliasing. Aliasingcan be implemented to associate different state, for example, fromdifferent sub-rings, with the specified node. Thus, although aliases fora given node have the same ID, each alias can have distinct stateassociated with them. Aliasing allows the specified node to participatein multiple rings having distinct proximity criteria that are notnecessarily common ancestors of more specific proximity criteria. Thatis, the specified node can participate in multiple branches of theproximity tree.

For example, a dual NIC (wired and wireless) laptop can be considered tobe proximally equivalent to both other wireless and wired nodes sharingthe same LAN segments as the laptop. But, these two distinct proximitycriteria can be modeled as sub-criteria that are applicable only afterapplication of a different higher priority proximity criterion, such as,for example, one based on organizational membership. As the laptopbelongs to the same organization, the aliased nodes in the two sub-ringsrepresenting 1) membership in the wired and 2) membership in thewireless LAN segments merge into a single node in the ring representingthe organization to which the laptop belongs. It should be understandthat the RouteProximally works as expected without any modifications inthe presence of aliasing.

Each proximal ring can be configured in accordance with (potentiallydifferent) ring parameters. Ring parameters can be used to define aneighborhood (e.g., ring parameters can represent a neighborhood range,a neighborhood size, ping message and depart message timing anddistribution patterns for ping and depart messages), indicate aparticular federating mechanisms (e.g., from among the above-describedfirst through fourth federating mechanisms previously described or fromamong other federating mechanisms), or define communication specificsbetween routing partners in the same proximal ring. Some ring parametersmay be more general, applying to a plurality of different federatingmechanisms, while other ring parameters are more specific and apply tospecific type of federating mechanism.

Ring parameters used to configure a higher level proximal ring can beinherited in some embodiments by lower level proximal rings. Forexample, it may be that ring 543 inherits some of the ring parameters ofring 531 (which in turn inherited from ring 522, etc.). Thus, aneighborhood size and neighborhood range associated with ring 531 isalso associated with ring 541.

However, inherited ring parameters can be altered and/or proximal ringscan be individually configured in accordance with different ringparameters. For example, it may be that ring 511 is for anadministrative domain that contains a large number of nodes and thus theabove-described fourth federating mechanism is more appropriate for ring511. On the other hand, it may be that ring 521 is for a small businesswith a relatively smaller number of nodes and thus the above-describedsecond federating mechanism is more appropriate for ring 521. Thus, thering parameters associated with ring 521 can be set to (or inheritedparameters changed to) different values than the ring parametersassociated with ring 511. For example, a ring parameter indicating aparticular type of federating mechanisms can be different between rings511 and 521. Similarly parameters defining a neighborhood can bedifferent between rings 511 and 521. Further, ring 521 can be configuredin accordance with specific parameters that are specific to theabove-described second federating mechanism, while ring 511 isconfigured in accordance additional with specific parameters that arespecific to the above-described fourth federating mechanism.

Accordingly, proximal rings can be flexibly configured based on thecharacteristics (e.g., number, included resources, etc.) of nodes in theproximal rings. For example, an administrator can select ring parametersfor proximal rings using a configuration procedure (e.g., through auser-interface). A configuration procedure can facilitate theconfiguration of inheritance relationships between proximal rings aswell as the configuration of individual proximal rings, such as, forexample, to override otherwise inherited ring parameters.

FIG. 8 illustrates an example flow chart of a method 800 forpartitioning the nodes of a federation infrastructure. The method 800will be described with respect to the rings of partition a tree 500 inFIG. 5. Method 800 includes an act of accessing a sorted linked listcontaining node IDs that have been assigned to nodes in a federationinfrastructure (act 801). For example, the sorted linked listrepresented by ring 501 can be accessed. The node IDs of the sortedlinked list (the nodes depicted on ring 501) can represent nodes in afederation infrastructure (e.g., federation infrastrucre100).

Method 800 includes an act of accessing proximity categories thatrepresent a plurality of different proximity criteria for partitioningthe sorted linked list (act 802). For example, proximity criterionrepresenting domain boundaries 561, geographical boundaries 562, andorganizational boundaries 563 can be accessed. However, other proximitycriteria, such as, trust domain boundaries, can also be represented inaccessed proximity criterion. Proximity categories can includepreviously created partially ordered lists of proximity criteria. A ringcan be partitioned based on partially ordered lists of proximitycriteria.

Method 800 includes an act of partitioning the sorted link list into oneor more first sub lists based on a first proximity criterion, each ofthe one or more first sub lists containing at least a subset of the nodeIDs from the sorted linked list (act 803). For example, ring 501 can bepartitioned into sub-rings 511, 512, 513, and 514 based on criterion571. Each of sub-rings 511, 512, 513, and 514 can contain a differentsub-set of node IDs from ring 501.

Method 800 includes an act of partitioning a first sub list, selectedfrom among the one or more first sub lists, into one or more second sublists based on a second proximity criterion, each of the one or moresecond sub lists containing at least a subset of node IDs contained inthe first sub list (act 804). For example, sub-ring 511 can bepartitioned into sub-rings 521, 522, and 523 based on criterion 581.Each of he sub-rings 521, 522, and 523 can contain a different sub-setof node IDs from sub-ring 511.

FIG. 9 illustrates an example flow chart of a method 900 for populatinga node's routing table. The method 900 will be described with respect tothe sorted linked list 304 and ring 306 in FIG. 3. Method 900 includesan act of inserting a predecessor node into a routing table, thepredecessor node preceding a current node relative to the current nodein a first direction of a sorted linked list (act 901). For example, thenode having ID 50 can be inserted into the routing table as apredecessor for the node having ID 64 (the current node). Moving in aclockwise direction 321 (from end A of sorted linked list 304 towardsend B of sorted linked list 304), the node having ID 50 precedes thenode having ID 64. Inserting a predecessor node can establish asymmetric partnership between the current node and the predecessor nodesuch that current node is a partner of predecessor node and thepredecessor node is a partner of the current node

Method 900 includes an act of inserting a successor node into therouting table, the successor node succeeding the current node relativeto the current node in the first direction in the sorted linked list(act 902). For example, the node having ID 76 can be inserted into therouting table as a successor for the node having ID 64 (the currentnode). Moving in a counter-clockwise direction 322, the node having ID76 succeeds the node having ID 64. Inserting a successor node canestablish a symmetric partnership between the current node and thesuccessor node such that current node is a partner of the successor nodeand the successor node is a partner of the current node.

Method 900 includes an act of inserting appropriate neighborhood nodesinto the routing table, the neighborhood nodes identified from thesorted linked list in both the first direction and in a second oppositedirection based on a neighborhood range and neighborhood size (act 903).For example, the nodes having IDs 83, 76, 50, and 46 can be insertedinto the routing table as neighborhood nodes for the node having ID 64(the current node). Based on a neighborhood range of 20 and aneighborhood size 4, the nodes having IDs 83 and 76 can be identified inclockwise direction 321 and the nodes having IDs 50 and 46 can beidentified in counter-clockwise direction 322 (moving from end B ofsorted linked list 304 towards end A of sorted linked list 304). It maybe that in some environments no appropriate neighborhood nodes areidentified. Inserting a neighborhood node can establish a symmetricpartnership between the current node and the neighborhood node such thatcurrent node is a partner of the neighborhood node and the neighborhoodnode is a partner of the current node.

Method 900 includes an act of inserting appropriate routing nodes intothe routing table, the routing nodes identified from the sorted linkedlist in both the first and second directions based on the a number baseand field size of the ID space for the federation infrastructure, therouting nodes representing a logarithmic index of the sorted link listin both the first and second directions (act 904). For example, thenodes having IDs 200, 2, 30, 46, 50, 64, 64, 64, 64, 64, 76, 83, 98, 135and 200 can be inserted into the routing table as routing nodes for thenode having ID 64. Based on the number base 2 and field size of 8 thenodes having IDs 64, 64, 76, 83, 98, 135 and 200 can be identified indirection 321 and the nodes having IDs 64, 64, 50, 46, 30, 2, and 200can be identified in direction 322. As depicted inside ring 306, therouting nodes represent a logarithmic index of the sorted link list 304in both clockwise direction 321 and counter-clockwise direction 322.Inserting a routing node can establish a symmetric partnership betweenthe current node and the routing node such that current node is apartner of the routing node and the routing node is a partner of thecurrent node.

FIG. 7 illustrates an example flow chart of a method 700 for populatinga node routing table that takes proximity criteria into account. Themethod 700 will be described with respect to the rings in FIG. 5. Method700 includes an act of inserting a predecessor node for eachhierarchically partitioned routing ring the current node participates ininto a routing table (act 701). Each predecessor node precedes thecurrent node in a first direction (e.g., clockwise) within eachhierarchically partitioned routing ring the current node participatesin. The hierarchically partitioned routing rings are partitioned inaccordance with corresponding proximity criteria and contain at leastsubsets of a bi-directionally linked list (and possibly the wholebi-directionally linked list). For example, it may be that a specifiednode participates in root ring 501 and sub-rings 511, 522, 523, 531, and542. Thus, a predecessor node is selected for the specified node fromwithin each of the rings 501 and sub-rings 511, 522, 523, 531, and 542.

Method 700 includes an act of inserting a successor node for eachhierarchically partitioned routing ring the current node participates ininto the routing table (act 702). Each successor node succeeding thecurrent node in the first direction within each hierarchicallypartitioned routing ring the current node participates in. For example,a successor node is selected for the specified node from within each ofthe rings 501 and sub-rings 511, 522, 523, 531, and 542.

Method 700 includes an act of inserting appropriate neighborhood nodesfor each hierarchically partitioned routing ring the current nodeparticipates in into the routing table (act 703). The neighborhood nodescan be identified in both the first direction (e.g., clockwise) and in asecond opposite direction (e.g., counter clockwise) based on aneighborhood range and neighborhood size from the hierarchicallypartitioned routing rings the current node participates in. For example,neighborhood nodes can be identified for the specified node from withineach of the rings 501 and sub-rings 511, 522, 523, 531, and 542.

Method 700 includes an act of inserting appropriate routing nodes foreach hierarchically partitioned routing ring the current nodeparticipates in into the routing table (act 704). For example, routingnodes can be identified for the specified node from within each of therings 501 and sub-rings 511, 522, 523, 531, and 542.

In some embodiments, appropriate routing nodes are inserted for eachproximity ring d except the leaf ring (or leaf rings in embodiments thatutilize aliasing), in which the node Y participates. Appropriate routingnodes can be inserted based on the following expression(s):

if Y._(sd).id<Y.id+b^(i)<Y._(sd+1).id is true, then use ring d; or

if Y.p_(d).id<Y.id−b^(i)<Y.p_(d+1).id is true, then use ring d.

If a ring has not been identified in the previous step, use the lead(e.g., ring 501) ring as ring d. Now, ring d is the proximity ring inwhich node Y should look for the routing partner closest to z.

FIG. 10 illustrates an example flow chart of a 1000 method for routing amessage towards a destination node. The method 1000 will be describedwith respect to the sorted linked list 304 and ring 306 in FIG. 3.Method 1000 includes an act of a receiving node receiving a messagealong with a number indicating a destination (act 1001). For example,the node having ID 64 can receive a message indicating a destination of212.

Method 1000 includes an act of determining that the receiving node is atleast one of numerically further from the destination than acorresponding predecessor node and numerically further from thedestination than a corresponding successor node (act 1002). For example,in direction 322, ID 64 is further from destination 212 than ID 50 and,in direction 321, ID 64 is further from destination 212 than ID 76.Method 1000 includes an act of determining that the destination is notwithin a neighborhood set of nodes corresponding to the receiving node(act 1003). For example, the node with ID 64 can determine thatdestination 212 is not within the neighborhood set of 83, 76, 50, and46.

The method 1000 includes an act of identifying an intermediate node froma routing table corresponding to the receiving node, the intermediatenode being numerically closer to the destination than other routingnodes in the corresponding routing table (act 1004). For example, thenode having ID 64 can identify the routing node having ID 200 as beingnumerically closer to destination 212 that other routing nodes. Themethod 1000 includes an act of sending the message to the intermediatenode (act 1005). For example, the node having ID 64 can send the messageto the node having ID 200.

FIG. 11 illustrates an example flow chart of a method 1100 for routing amessage towards a destination node based on proximity criteria. Themethod 1100 will be described with respect to the rings in FIG. 4 andFIG. 5. Method 1100 includes an act of a receiving node receiving amessage along with a number indicating a destination and a proximitycriterion (act 1101). The proximity criterion defines one or moreclasses of nodes. The receiving node receives the message as part of acurrent class of nodes selected form among the one or more classes ofnodes based on the proximity criterion. For example, the node having ID172 can receive a message indicating a destination of 201 and proximitycriterion indicating that the destination node be part of classesrepresented by ring 401. The node having ID 172 can receive the messageas part of ring 404.

Method 1100 includes an act of determining that the receiving node is atleast one of, numerically further from the destination than acorresponding predecessor node and numerically further from thedestination than a corresponding successor node, among nodes in aselected class of nodes (act 1102). For example, within ring 404, thenode with ID 172 is further from destination 201 than the node having ID174 in the clockwise direction and is further from destination 201 thanthe node having ID 153 in the counterclockwise direction.

Method 1100 includes an act of determining that the destination is notwithin the receiving node's neighborhood set of nodes for any of the oneor more classes of nodes defined by the proximity criterion (act 1103).For example, the node having ID 172 can determine that destination 201is not in a corresponding neighborhood set in ring 404 or in ring 401.

Method 1100 includes an act of identifying an intermediate node from thereceiving node's routing table, the intermediate node being numericallycloser to the destination than other routing nodes in the routing table(act 1104). For example, the node having ID 172 can identify the nodehaving ID 194 as being numerically closer to destination 201 than otherrouting nodes in ring 404. The method 1100 includes an act of sendingthe message to the intermediate node (act 1105). For example, the nodehaving ID 172 can send the received message to the node having ID 194.The node having ID 172 can send the received message to the node havingID 194 to honor a previously defined partially ordered list of proximitycriterion

Node 194 may be as close to destination 201 as is possible within ring404. Thus, proximity can be relaxed just enough to enable furtherrouting towards the destination to be made in ring 401 in the next leg.That is, routing is transitioned from ring 404 to ring 401 since nofurther progress towards the destination can be made on ring 404.Alternately, it may be that the node having ID 201 is within theneighborhood of the node having ID 194 in ring 401 resulting in nofurther routing. Thus, in some embodiments, relaxing proximity criteriato get to the next higher ring is enough to cause further routing.

However, in other embodiments, incremental relaxation of proximitycriteria causing transition to the next higher ring continues untilfurther routing can occur (or until the root ring is encountered). Thatis, a plurality of transitions to higher rings occurs before furtherrouting progress can be made. For example, referring now to FIG. 5, whenno further routing progress can be made on ring 531, proximity criteriamay be relaxed enough to transition to ring 511 or even to root ring501.

Node Phases

A node participating in a federation infrastructure can operate indifferent operational phases. Valid phase values for a node can bedefined to be members of an ordered set. For example,{NodeId}.{InstanceIds}.{Phase Value [Phase-State Values: Inserting,Syncing, Routing, Operating]. [Phase.Unknown Indication: phase known attime of transmission, phase unknown at time of transmission]} definesone possible ordered set representing a phase-space of a given nodewithin a federation infrastructure. A node instance can transition (oradvance) through the node phase-states from Inserting to Syncing toRouting to Operating in order. Further, in some embodiments, a nodeinstance can be configured such that the node instance is prevented fromtransitioning back to a prior node phase-state. In some embodiments, anode advances its instance ID each time the node comes up.

For example, a node instance can prevented from transitioning fromRouting back to Syncing (or back to Inserting), etc. Accordingly, insome embodiments, when it is known that a given node instance (e.g.,identified by (NodeId, InstanceId)) has advanced to a particular nodephase-state (e.g., Operating), it is also known that the given nodeinstance is not likely to (and in some embodiments will not) revert to aprior node phase-state (e.g., back to Routing, Syncing, or Inserting).Thus, there is a significant likelihood that any node instance in a nodephase prior to the particular node phase-state is a new (and advanced)instance of the node.

In some embodiments, phase information and corresponding instance Ids(which advance as a node comes up) are transferred together. Thus, it ispossible to determine that a lesser node phase-state for the sameinstance is older. Further, when a newer node instance is known (at anyphase-state values) any information about older instances is consideredout of date.

From time to time, nodes can reboot or lose communication with oneanother, such as, for example, when first starting up, through agraceful departure, or as a result of abnormal termination (crash).Thus, there is the potential for a node in any node phase-state toreboot or lose communication with other nodes. For example, a crash cancause a node in a Routing phase-state to reboot. During a reboot or loseof communication, there may be no way to determine what node phase-statea node is in. Accordingly, when a node is rebooting or communication toa node is lost, a [Phase.Unknown Indication] can be set to indicate thatthe phase-state for the node is currently not known. However, anypreviously expressed and/or detected phase-state for the node can bemaintained and is not lost.

The [Phase.Unknown Indication] can be used to indicate whether aphase-state was known at the time a phase-state value was transmitted(e.g phase value with phase.unknown not set) or if a phase-state is apreviously expressed phase-state and the phase-state was not known atthe time the phase-state was transmitted (e.g., phase value withphase.unknown set). Thus, the phase of a node (its phase value) can berepresented using both a phase-state value and a phase.unknownindication.

Join Protocol

From time to time, nodes can join to and depart from existingfederations. The nodes can implement appropriate protocols for joiningand departing federations. For example, a node can implement a Join()function to become part of an existing federation. A node implementingthe Join( )function can transition through three ordered phase-states:an inserting phase-state, a synchronizing phase-state, and a routingphase-state before reaching the final operating phase-state. In otherembodiments these specific order phase-states may not exist while othersmay be defined. FIG. 12A illustrates an example of a node establishingmembership within a federation infrastructure. FIG. 12B illustrates anexample of nodes in a federation infrastructure exchanging messages.

Insertion Phase: A node, Y, enters this phase-state by issuing a joinmessage, including at least its node ID and indicating a join action tothe federation. A join message can be a routed message sent by a newlyjoining node (node Y) with its destination property set to the identityof the newly joining node. In this phase-state, a newly-joining node isinserted between its predecessor and successor nodes in the federation.The insertion phase-state can be implemented according to the followingalgorithm (All arithmetic is performed modulo b^(n)):

IP1 Y identifies an existing node that is already part of a lowest ringfrom which the joining node wishes to participate in the federation.This can either be statically configured or dynamically discovered usingDHCP and/or DNS and/or WS-Discovery or a (potentially well-known)constant. Let this existing federation node be E.

IP2. Y invokes E.RouteNumerically(Y, joinMsg) to determine the node Xwhose ID is numerically closest to Y.id in every proximity ring that thenode Y participates. This can include routing a join message to multiplenodes.

IP3. Determine the numerical successor (s) and predecessor (p) nodes.(Note that the data needed to do the following insertion can be carriedin the join message and its response. As such, there are no additionalroundtrips needed.)

Case 1: X.id>Y.id

Y.s=X, Y.p=X.p, X.p.s=Y, and X.p=Y

Case 2: X.id<Y.id

Y.p=X, Y.s=X.s, X.s.p=Y, and X.s=Y

In response to the join message, node X (the node that processed thejoin message) can send a join response back to node Y. The join responsecan indicate the predecessor node (Y.p) and successor node (Y.s) fornode Y. Node Y can receive the join response and process the joinresponse to become aware of its predecessor and successor nodes. Afterprocessing the join response, Node Y can be a weak routing participantin the federation. For example, Node Y can simply forward message sentto it, either to its successor or predecessor nodes. Thus, Node Y isinserted into the federation infrastructure but routing and neighborhoodtables are not populated. Before reaching this point, node Y willrequest other nodes sending it messages to redirect the messages sent toit through a different node by returning a status message to the sendingnode indicating that node Y′s liveness phase is in an insertingphase-state.

Generally, from time to time, nodes can exchange sync request andresponse messages. Sync request and sync response messages can includeliveness information (e.g., headers) for other nodes from the sender'spoint of view. Neighborhood state can also be included in sync requestand response messages such that application layers in a neighborhood areaware of one another's state. One example of when sync request andresponse messages are exchanged is during a synchronizing phase-state ofa joining node. However, sync request and response messages can beexchanged during other operational phase-states as well (e.g. while inthe Operating Phase-state).

FIG. 16 depicts an example of a message model and related processingmodel 1600. As depicted in FIG. 16, a node can send and receive syncrequests messages. For example, sync request message 1601 can bereceived at function layer 1651 from a newly inserted node (e.g., thenode in FIG. 12B having ID 144). Application data 1602 (e.g., namespacesubscriptions) can be piggybacked in sync request message 1601. Functionlayer 1651 can inform application layer 1652 of any application dataincluded in sync requests messages. For example, function layer 1651 caninvoke neighborhood state sync event 1603, including application data1602, to application layer 1652. Sync request 1631, includingapplication data 1607, can also be sent to another node that processessync request 1631 similar to the processing to sync request 1601 inprocessing model 1600.

In response to some function layer event (e.g., sync request message1601, sync response message 1641, or ping message 1612) function layer1651 can invoke the neighborhood state request function 1604 inapplication layer 1652. Neighborhood state request 1604 is a request tothe application layer to obtain the state that needs to be propagated inthe neighborhood. In response to neighborhood state request 1604,application layer 1652 can supply neighborhood state 1606, includingoptional application data 1607, to function layer 1651. Alternately,application layer 1652 can send neighborhood state 1606, includingoptional application data 1607 in reaction to some application layerevent. Using internal mechanisms similar to the above, function layer1651 can send sync response message 1608, including optional applicationdata 1607, to propagate application layer neighborhood state.

Synchronization Phase: After processing a join response message, a nodeY transitions from the insertion phase-state to synchronizing (Syncing)phase-state. In the synchronization phase-state, the newly-inserted nodeY synchronizes information with nodes in the neighborhood. Generally,Node Y can send sync messages to at least its predecessor and successornodes identified in the insertion phase-state. These nodes processingthe sync messages can return sync responses that indicate correspondingneighborhood and routing partner nodes of these processing nodes. In amore specific example, the synchronizing phase-state can be implementedaccording to the following algorithm (All arithmetic is performed modulob^(n)):

SP1. Compute the Neighborhood(Y) from the union of Neighborhood(Y.s) andNeighborhood(Y.p) nodes in each proximal ring the node Y participates.The union computation can be done as follows:

(s _(j) , . . . , s ₁ , s, p, p ₁ , . . . , pk) such that s _(j).s.id>(Y.id+u/2), j≧v/2−1, p _(k) .p.id<(Y.id−u/2), and k≧v/2−1

SP2. Referring briefly to FIG. 16, query Y′s local application layer(e.g., application layer 1652) via a neighborhood state request (e.g.,neighborhood state request) 1604 to obtain optional application specificneighborhood data (e.g., application specific data 1607).

SP3. Send synchronize message to at least the proximal successor andpredecessor nodes including at least liveness state information of eachproximal neighborhood and routing partner node from Y′s perspective. Anyoptional application specific neighborhood data (e.g., application data1607) accessed via SP2 is included in the sync request 1631.

SP3. Y receives sync response messages back from those nodes processingsync messages sent in SP2. For example, node Y can exchange synchronizemessages (request/response) with one or more nodes within its computedneighborhood. After synchronize messages are exchanged with at least oneand potentially all of a node Y′s neighborhood nodes, the computedneighborhood nodes can exchange further messages to propagatesynchronized data. A synchronization message (request or response) canbe a non-routed message sent by a node to proactively synchronize itsdata with a target node that is, for example, in the nodes neighborhood.

SP4. As sync response message in SP3 are received (e.g., sync responsemessage 1641) , any optional application specific neighborhood datapresent in these received sync response messages (e.g., application data1622) can be offered to Y′s application layer 1652 via neighborhoodstate sync event 1603.

As part of the synchronizing phase-state, the proximal successor (e.g.,Y.s) and predecessor (Y.p) nodes exchange their routing tables with thenewly-inserted node (e.g., Y). Nodes that receive sync messages canrespond by sending sync responses. Sync responses carry data similar tosynchronize messages except from the perspective of the responding node.Both sync messages and sync responses can carry (or piggyback)application data. Thus, application data can be propagated between nodesduring the synchronizing phase-state. When the synchronize phase-stateis complete, the node can process messages destined for it, instead ofsimply forwarding them either to a successor or predecessor. However,the node may still be viewed as a weak routing participant because itsrouting table is not populated.

Routing Phase: After the synchronizing phase-state is completed, a nodetransitions into the routing phase-state. In the routing phase-state,the newly-synchronized node (e.g., node Y) computes its routing nodes.The routing phase-state can be implemented according to the followingalgorithm (All arithmetic is performed modulo b^(n)):

RP1 If the routing phase-state is being executed as part of thebalancing procedure (explained later), ensure that the successor node(Y.s) and the predecessor node (Y.p) are alive in every proximity ringthe node Y participates. If either is not alive, determine thereplacement node for the failed one(s) by choosing a next best successoror predecessor node among the neighborhood nodes in the ring underconsideration.

RP2. For 1<i<n−1

-   -   RP2a. Compute z=Y.id±b^(i)

RP2b. If the ring d is not the most specific proximity, find theproximity ring d in which the node Y participates and satisfying thecondition Y.s_(d).id<Y.id+b^(i)<Y.s_(d+i).id orY.p_(d).id<Y.id−b^(i)<Y.p_(d+1).id. Else make ring d the most specificproximity ring. Ring d is the proximity ring in which node Y should lookfor the routing partner closest to z. Let Q be the node numericallyclosest to z between Y.s_(d).r_(±i) and Y.p_(d).r_(±i). If |Q.id-z| iswithin a configurable percentage of b^(i) (typically 20%), simply makeY.r_(±i)=Q. If Q.id is closer to z than either (Y.s_(d).id±b^(i)) or(Y.p_(d).id±b^(i)), it means node Y is a better partner routing node tonode Q in proximity ring d than either Y.s_(d) or Y.p_(d). Therefore,send updateMsg to node Q, if it has not already been sent, supplying iand node Y as parameters so that node Q can establish node Y as itspartner routing node at r_(−i).

RP2c. If this phase-state is being executed as part of the balancingprocedure and if Y.s_(d).r_(±i).id==Y.p_(d).r_(±i).id, there is only onenode in the numerical range between (Y.s_(d).id±b^(i)) and(Y.p_(d).id±b^(i)). That node is the one pointed to by the routing noder_(±i) of the successor (or predecessor) node. Therefore, simply makeY.r_(±i)=Y.s_(d).r_(±i.i).

RP2d. Else, compute the routing partner Y.r_(±i) by invokingRouteProximally on node Q with the proximity criterion set to that ofring d. This implies Y.r_(±i)=Q.RouteProximally(z, updateMsg, d).

RP3. At this point, node Y can process not only messages destined for itbut can also route messages.

RP4. Subscribe to liveness notification events sent from the applicationlayer for the endpoint IDs of the partner routing nodes, if this has notalready been done. Also, revoke any liveness event subscriptionspreviously established with the application layer for the nodes that areno longer partner routing nodes. For example, subscription and/or revokerequests can be passed up to an application layer (e.g., applicationlayer 121) that implements pub-sub logic for a corresponding application(e.g., a namespace application). When subsequent application specificliveness messages (e.g. those resulting from namespace subscriptions)are received at the application layer, notifications (events) can bepushed down to other lower layers (e.g., other lower layers 131) forprocessing

FIG. 17 depicts an example of a number of liveness interactions that canoccur between function layer 1751 and application layer 1752. Asdepicted in FIG. 17, endpoints are, for example, publish/subscribetopics (e.g., represented by a URL or URI) representing various nodesand can be, for example, federation infrastructure nodes. Subscribe ToLiveness Event 1701 can be invoked from function layer 1751 toapplication layer 1752 to subscribe to a liveness event (e.g., to apublish/subscribe topic). Revoke Liveness Subscription 1702 can beinvoked from function layer 1751 to application layer 1752 to revoke asubscription to a liveness event. End Point Down 1703 can be sent fromapplication layer 1752 to function layer 1751 to indicate that anendpoint may be down and provide function layer 1751 with an optionalreplacement endpoint. End Point Down event 1703 can be sentasynchronously based on a prior subscription (e.g., Subscribe ToLiveness Event 1701).

Node Down 1704 can be invoked from function layer 1751 to applicationlayer 1752 to indicate that function layer 1751 (or some other lowerlayer) has detected a failed node and optionally provide applicationlayer 1752 with a replacement node. Application layer 1752 cansubsequently propagate that a potentially failed node was detected toother interested parties. Node down event 1704 can be sentasynchronously anytime function layer 1751 or some other lower layerdetects a potentially failed node. Send liveness 1706 can be invokedfrom application layer 1752 to function layer 1751 when applicationlayer 1752 detects that a node is down (e.g., from node down event 1704or from some other out-of-band mechanism). Send liveness event 1706 cancause function layer 1751 to send a liveness message. Send livenessevent 1706 can also be invoked asynchronously anytime application layer1752 detects that a node is down and does not depend on any priorestablished subscriptions (via subscribe to liveness).

Thus, in some embodiments, function layer 1751 is used recursively. Forexample, function layer 1751 can indicate an interest in a specifiednode (e.g., is the particular node up or down) to application layer1752. Application layer 1752 can formulate an application specificsubscription for notifications related to the specified node and thenreuse function layer 1751 to communicate the formulated subscription toappropriate corresponding application layer 1752 instances in otherfederation nodes. For example if the application layers 1752 with infederation nodes implemented a namespaces pub/sub behaviors, functionlayer 1751 can route the subscription to a publish/subscribe managerthat manages notifications for the specified node—the pub/sub Managerbeing implemented as at least part of the application 1752 in therelated federation nodes. Accordingly, function layer 1751 is used toroute a subscription that function layer 1751 caused to be generated.Similar recursive mechanisms can also be used to unsubscribe orotherwise indicate that there is no longer an interest in the specifiednode.

Operating Phase: After the routing phase-state is completed, a nodetransitions into the operating phase-state. The node can remain in anoperating phase-state until it goes down (e.g., rebooting). In theoperating phase-state, the node can send update messages to routingpartners from time to time. Update messages (both update requests andupdate responses) can include neighborhood node liveness information forthe sending nodes (e.g., for all proximal neighborhoods of interest).This sent liveness information can also include that of the sender'sliveness info. Update messages can be routed messages originated bynodes to periodically update its routing partner nodes. Application datacan be piggyback on update messages such that application data can bepropagated during routing partner updates. The message destination isset to the identity of the perfect routing partner at the desiredrouting index. The Message ID property of this message is assigned anapplication sequence number so as to enable the node(s) processing thismessage to determine the latest message and this message is routedproximally.

A node that receives an update message can respond with an updateresponse. An update response carries the same data as the update messageexcept that the data is from the perspective of the responding node.Through the exchange of update messages and update responses nodes canexchange routing information. From time to time, operational nodes canupdate routing partners.

From time to time, operational nodes can also send ping messages (e.g.,ping messages 1609 and 1611). A ping message is a one-way message sentby a node to periodically announce its presence and disseminateinformation within its neighborhood about its neighborhood/routing nodesand replicate (e.g., piggybacked) application data.

An origin node can send a ping message to one or more of its immediatepredecessor and successor neighborhood nodes. Thus, depending on theping distribution pattern (i.e., which nodes are sent ping messages)information related to the origin node is propagated to other nodes on aring within the neighborhood of the origin node. For example, the originnode can send a ping message only to its immediate predecessor andsuccessor nodes and the ping message propagates outward from theposition (node ID) of the origin node along the ring in both directionsto the edge of the origin's neighborhood. Alternately, the origin nodecan send a ping message to every n^(th) node in its neighborhood in bothits predecessor and successor directions.

Each node receiving a ping message checks its interest in the originnode from a neighborhood range perspective. If not interested, itdiscards the ping message. If interested it processes the ping messageand forwards the ping message according to its specified ping pattern ifsuch forwarding is constrained to the neighborhood of the originatingnode. For example, after processing a ping message a receiving node canforward the ping message to at least its successor node if the sendingand origin nodes are in its predecessor node set or at least itspredecessor node if the sending and origin node are in its successorset.

Thus, the outward propagation of ping messages stops when the messagereaches the edge of the neighborhood node set around the origin node.The Message ID property of ping message is assigned an applicationsequence number so as to enable the nodes processing this message todetermine the latest message from the origin node and avoid duplicateprocessing or otherwise unneeded forwarding.

Referring back to FIG. 16, ping message 1609 can be received at functionlayer 1651 from a neighborhood node. Application data 1612 (e.g.,namespace subscriptions) can be piggybacked in ping message 1609.Function layer 1651 can inform application layer 1652 of any applicationdata included in ping messages. Similarly, function layer 1651 caninform application layer 1652 of any application data included in SyncRequest messages. Both of these cases of transference can beaccomplished via sending a neighborhood state sync event 1603, includingapplication data 1612, to application layer 1652.

In response to some function layer event (e.g., received ping message1609) function layer 1651 can send neighborhood state request 1604 toapplication layer 1652. Neighborhood state request 1604 is invoked onthe application layer 1652 to obtain the state that needs to beoptionally propagated in the neighborhood. In response to neighborhoodstate request 1604, application layer 1652 can return neighborhood state1606, including optional application data 1607, to function layer 1651.Function layer 1651 can send ping message 1611, including optionalapplication data 1607, to propagate neighborhood and routing partnernode liveness information as well as optional application layerneighborhood state. Function layer 1651 can also send sync response1608, including optional application data 1607, to propagate applicationstate.

Departure Protocol

When it is appropriate for a node to depart from a federation, the nodecan implement a Depart function to be gracefully removed from thefederation. A node departs an existing federation by sending a departuremessage to one or more of its immediate proximal predecessor andsuccessor nodes, and maybe other nodes in the same proximalneighborhood. Thus, depending on the departure distribution pattern(i.e., which nodes are sent departure messages) information related tothe departing node is propagated to other nodes on a ring within theneighborhood of the departing node. A departure message is a one-waymessage originated by a gracefully departing node to inform one or moreother nodes within at least one of its proximal neighborhoods about itsimpending departure. The departing node propagates the depart message(e.g., within its neighborhood) in a manner similar to the propagationof the ping messages. For example, the node having ID 30 can send departmessages 1219 to the nodes having IDs 17 and 40. The node having ID 30can then remove itself from the federation infrastructure from thestandpoint of a given proximal ring. Note that it is possible that anode remove itself from one proximal neighborhood but not others towhich it may belong.

Since the nodes having IDs 17 and 40 (i.e., the predecessor andsuccessor nodes) are likely to be the closest nodes to ID 30 after thenode having ID 30 is removed, the nodes having IDs 17 and 40 are madeaware of the node having ID 30′s departure. Thus, future messages thatare to be delivered to ID 30 can be appropriately processed at the nodeshaving IDs 17 and 40. The nodes having IDs 17 and 40 can propagate thedeparture of the node having ID 30 to the other nodes on ring 1206. Inthe absence of the node having ID 30, the nodes have IDs 17 and 40 canalso recompute predecessor and successor pointers, potentially pointingto each other.

The Message ID property of a depart message is assigned the sameapplication sequence ID as that of Ping messages so as to enable thenodes processing the depart message to determine the latest messageamong a series of ping and depart messages sent by an origin node.Graceful departure from a federation proximal ring is optional butencouraged. However, the federation is designed to self-heal if nodesleave abruptly.

Liveness

During the lifetime of a federation, nodes can exchange livenessinformation to maintain the federation. Liveness information can beincluded in virtually any message that is exchanged within a federationin the form of Liveness Message Headers. For example, join messages,join responses, sync messages, sync responses, update messages, updateresponse, application specific messages, liveness messages, and pingmessages can all include liveness information headers. When a federationnode sends any message or response, the node can include Livenessinformation for processing by other nodes. Linveness information can beincluded in a liveness information header of liveness message.

Liveness information indicating the liveness state of a node can berepresented using the following properties:

-   -   [Node]: Identifies the node whose liveness state is being        represented. A node can be identified based on [Reference        Properties] that further include an [Instance ID].        -   [Reference Properties]: Element information items specified            in the WS-addressing specification. WS-addressing defines            the [Instance ID] reference property for inclusion in the            reference property set.            -   [Instance ID]: A number that identifies a particular                instance of a node. An incrementing boot count can be                used as the instance ID of a node.    -   [Phase]: Conveys the phase of identified node.        -   [Phase-State Value] Conveys the highest phase-state            (inserting, synchronizing, routing, operating) that the            indicated node instance was know to have achieved        -   [Phase.Unknown Indication] An indicator that conveys if the            current phase is known or unknown.    -   [Freshness]: Conveys the freshness of the information and its        value ranges from 0 to MaxFreshness. The higher the value, the        fresher the information with 0 implying no information and        MaxFreshness is a protocol defined constant.    -   [Color]: Identifies the proximity equivalence class to which the        node belongs. Two nodes with the same color value are always        considered to be proximally closest because they both belong to        the same equivalence class identified by the color value. The        number of proximity equivalence classes can increase over time        as more nodes join the federation.    -   [Weight]: Supplies the node capability metric and its value        ranges from 0 to MaxWeight. It measures the desirable        characteristics of a federation node such as large computational        power, high network bandwidth, and long uptime. The higher the        value, the more capable the node is making it more desirable        from a partnership perspective.

In some environments, the [Node] and [Freshness] properties of a nodeare either implicitly or explicitly conveyed in a larger scope such asthe [Origin] and [Sender] message headers and as such inclusion of theabove properties again in the liveness headers will be duplicative. Forexample the sender of a message need only convey its current phase,color, and weight information as its ID, Instance Id are supplied in themessage addressing headers and its Freshness is implied.

Liveness state can be at least partially ordered based on a “<” binaryrelation defined as follows:

“L1<L2” is true if

-   -   1. “L1.[Node].[Name]==L2.[Node].[Name]” is true and one of the        following is true with the tests performed and short-circuited        in the order listed:        -   L1.[Node].[Reference Properties].[Instance ID] <L2. [Node].            [Reference Properties]. [Instance ID]        -   L1.[Phase.Unknown Indication] !=true AND L2.[Phase.Unknown            Indication] !=true AND L1.[Phase-State] <L2.[Phase-State]        -   L1.[Freshness] <L2.[Freshness]    -   2. Or “L1.[Color] L2.[Color]” is true and one of the following        is true with the tests performed and short-circuited in the        order listed:        -   L1.[Phase-State] <L2. [Phase-State]        -   L1.[Weight] <L2.[Weight]

Further, a liveness “down” message can be sent to a specified node whenit is detected or suspected that the specified node has becomeunavailable (e.g. gone down). As an example, when an application layer(e.g., application layer 121) detects that another application layer(e.g., application layer 123) or a node hosting that another applicationlayer is down, the detecting application layer can notify other lowerlayers (e.g., other lower layers 131) that the node may be down, forexample, in accordance with message model and related processing models1600 and/or 1700. Such a notification can cause other lower layers, suchas, for example, function layer 1651, to send a liveness down message.This is only one example of stimulus for the generation of liveness downmessages.

Since liveness down messages are routed and thus delivered to a nodeclosest to those nodes suspected of being down, if a liveness downmessage for a specified node gets delivered back to the specified node,then either the specified node never went down or the specified node isa different instance (e.g., with a different instance ID). On the otherhand, if the liveness down message gets delivered to another node, itindicates the specified node does appear to have gone down. Accordingly,if the node receiving the liveness down message views itself as being inthe proximal neighborhood of the specified node, it may source adeparture message for the specified node into that proximal neighborhoodas described as well as indicating to its the application layer (e.g.,using Node Down 1704) that the specified node may be down and that thereceiving node is its replacement. A liveness down message for thespecified node can be routed proximally with its target ID set to thatof the node that may be down.

Balancing Procedure

Embodiments of the present invention are designed to accommodate largenumber of nodes joining and departing the federation in a short periodof time. Such changes in the network can cause routing delays if thelogarithmic search trees maintained at the various nodes becomeunbalanced. That is, if there are more nodes on one side of a ring thanthe other. To facilitate optimal routing efficiency, nodes participatingin a federation execute the balancing procedure when certain criteriaare met.

For example, when any of the following conditions are true, any node canexecute the balancing procedure to ensure a balanced routing table foroptimal routing efficiency:

-   -   A configured number of liveness messages described above were        received.    -   A configured amount of time has elapsed since the receipt of the        last liveness message described above.    -   The neighborhood has changed in the sense that some new nodes        have arrived or some existing nodes have departed.

Balancing the routing tables is a simple process. For example, nodeswith an unbalanced routing table can re-execute the Synchronization andRouting phase-states of the Join protocol.

Acts RP2b, RP2d and RP4 combined with 1) finding the closest routingnode to a number, 2) the departure protocol followed by the nodesleaving a federation gracefully, and 3) balancing procedure followed bythe nodes receiving liveness messages result in a the faster healingsystem when federating nodes join and depart the network fairly quicklyand in large numbers.

Status Messages

A status message is non-routed message sent by a receiver node to asender node to inform routing success/failure of a correlated messagethat the sender node previously forwarded to the receiver node. FIG. 18depicts an example of how messages forming part of a request-responsemessage exchange pattern are routed across nodes on a ring. A statusmessage can include headers that identify the original correlatedmessage whose routing status is being reported. As such, status messagescan be used between nodes to indicate that message was successfullyrouted form one node to the next. For example, routing request message1811 from node 1801 to node 1806 includes sending request 1811 thoughnodes 1802, 1803, 1804, and 1805. Corresponding cascading success statusmessages (status 1817, 1818, 1819, 1820 and 1821) can be sent from node1806 to node 1805, from node 1805 to node1804, from node 1804 to node1803, from mode 1803 to node 1802, and from node 1802 to node 1801respectively. In response to request 1811, response 1816 can be sentend-to-end from node 1807 to node 1801. Response 1816 is optional andmay not exist in a one-way message exchange pattern.

FIG. 13 illustrates an example flow chart of a method 1300 for a node tojoin the federation infrastructure. The method 1300 will be describedwith respect to ring 1206 in FIGS. 12A and 12B. Method 1300 includes anact of issuing a join message to a federation infrastructure (act 1301).For example, the node having ID 144 can issue join 1201 to federationinfrastructure including ring 1206. Method 1300 includes an act ofreceiving a join message from a joining node (act 1308). For example, anexisting node in the federation infrastructure including ring 1206 canreceive join 1201.

Method 1300 includes an act of routing a join message to a processingnode (act 1309). The processing node can be a node having an IDnumerically closer the ID of the joining node than other active nodes inthe federation infrastructure at the time the join message is beingrouted. For example, join 1201 can initially be received at the nodehaving ID 64, routed to the node having ID 135 and routing to the nodehaving ID 151.

Method 1300 includes an act of computing one or more predecessor nodesand one or more successor nodes for the joining node (act 1310). Forexample, the node having ID 151 can compute an immediate predecessornode and an immediate successor node for the node having ID 144. Withinring 1206, the node having ID 151 can compute that the node having ID135 is an immediate predecessor node that the node having ID 151 is animmediate successor node. Similar computations can be made for otherproximal rings.

Method 1300 includes an act of computing one or more routing nodes forthe joining node (act 1311). For example, the node having ID 151 cancompute routing nodes (from the node having ID 151′s perspective) forthe node having ID 144. Within ring 1206, the node having ID 151 cancompute, for example, that the nodes having IDs 218 and 40 are routingnodes for the node having ID 144. Similar computations can be made forother proximal rings.

Method 1300 includes an act of sending a join response to the joiningnode (act 1312). A join response can identify all the predecessor andsuccessor neighborhood and routing partner nodes for the joining node ascomputed by the processing node given its current view of the federationinfrastructure. For example, join response 1202 can identify at leastthe node having ID 135 as the immediate predecessor node to the nodehave ID 144, can identify the node having ID 151 as the immediatesuccessor node to the node having ID 144, and can identify any routingnodes (for the node having ID 144) computed at the node having ID 151for node ID 144 (the newly joining node).

Method 1300 includes an act of receiving a join response from afederation node that processed the join message (act 1302). For example,the node having ID 144 can receive join response 1202 from the nodehaving ID 151.

Method 1300 includes an act of sending a sync request to at least eachof the immediate proximal predecessor nodes and immediate proximalsuccessor nodes (act 1303). For example, referring now to FIG. 12B, thenode having ID 144 can send sync requests 1203 to the nodes having IDs135 and 151. Sync request 1203 can include an identification of anyneighborhood nodes of the node having ID 144 and/or an identification ofany routing partners of the node having ID 144.

The nodes having IDs 135 and 151 can receive the sync requests 1203. Inresponse to receiving sync requests 1203, the nodes having IDs 135 and151 can identify their neighborhood and routing partner nodes fromcorresponding routing tables. The nodes having IDs 135 and 151 caninclude their identified neighborhood and routing partner nodes'liveness information in sync response 1204 and send the send syncresponses 1204 to the node having ID 144.

Method 1300 includes an act of receiving a sync response from each ofthe proximal predecessor and successor nodes (act 1304). For example,the node having ID 144 can receive sync responses 1204 from the nodeshaving IDs 135 and 151. Sync response 1204 can include livenessinformation for one or more nodes on ring 1206 or other rings in afederation infrastructure. Sync response 1204 can also identify anyprospective routing partner nodes for the node having ID 144.

Method 1300 includes an act of computing neighbor nodes (act 1305). Forexample, the node having ID 144 can compute corresponding neighborhoodnodes based on the union of the neighborhood nodes for the nodes havingIDs 135 and 151. Neighborhood nodes can be computed based on asummarized view of the join response message and any sync responsemessages.

Method 1300 includes an act of computing routing nodes (act 1306). Forexample, the node having ID 144 can compute routing nodes from among thenodes of ring 1206. Routing partners can be computed base on asummarized view of the join response message and any sync responsemessages.

Method 1300 includes an act of exchanging at least neighborhood nodeinformation with computed routing partners (act 1307). For example, thenode having ID 144 and the node having ID 218 (a computed routingpartner) can exchange state information (e.g., instance ID, phase-state,etc) corresponding to their respective neighborhood nodes. Theseexchanges are accomplished by the newly joining node sourcing (routing)an Update message to at least each unique computed routing partner asdescribed in the Routing Phase-state text above. The nodes processingthe Update message will send corresponding Update response message inreaction to the receipt of these update messages from the newly joiningnode. The Update response includes at least the liveness information foritself and its neighborhood nodes.

Method 1300 can also include an act of initiating an initial propagationof routing tables to at least one neighborhood node. For example, thenode having ID 144 can include computed neighborhood and routing partnernodes in a ping message and send the ping message to the node having ID174 (e.g., one of the computed neighborhood nodes). The node having ID174 can receive the ping message and update a corresponding routingtable with the liveness information originated at the node having ID144. The node having ID 174 can also include its corresponding routingtable in a second ping message and send the second ping message at somefuture point to the node having ID 144. The node having ID 144 canreceive the second ping message and can update its corresponding routingtable with nodes in the liveness information included in second pingmessage (i.e., nodes in the routing table of the node having ID 174).The node having ID 144 can repeat the sending of ping messages withother neighborhood nodes in ring 1206.

It should be understood that when a newly joining node joins afederation, the newly joining node may not find an existing federationmember and thus becomes the sole member. Thus, there may be nopredecessor, successor, or neighbor nodes assigned for the newly joiningnode. Accordingly, the newly joining node is mapped as the best routingpartner in all cases.

Further, although the method 1300 has been described with respect to asingle ring (ring 1206), it should be understood that in someembodiments a node that joins one ring inherently also joins one or moreother rings. For example, referring briefly back to FIG. 5, a node atjoins ring 551 inherently also joins rings 543, 531, 522, 511, and 501.Thus, method 1300 can be implemented to join a plurality of rings. Inother embodiments some or all of the acts in method 1300 may be repeatedwhen joining multiple rings. For example, referring again to FIG. 5, oneor more of the acts of 1300 can be repeated when a node joins both ring551 and ring 514 (e.g., aliasing). In any event, a joining node ID canbe accessed and used to identify a joining node in a sorted linked listas well as corresponding hierarchically partitioned sub-lists thejoining node is to participates in. A receiving node is identified fromthe sorted linked list and each partitioned sub-list. The join messageis routed to a processing node (e.g., based on ID) in the sorted linkedlist and each portioned sub-list. A join response is received from theprocessing node in the sorted linked list and each partitioned sub-list.

FIG. 14 illustrates an example flow chart of a method 1400 for a node tomaintain membership in a federation infrastructure. The method 1400 willbe described with respect to ring 1206. Method 1400 includes an act ofsending a first ping message to a neighborhood node (act 1401). Thefirst ping message indicates that a current node sending the first pingmessage is neighbor of the neighborhood node. The first ping message canalso include routing partner and neighborhood nodes' state of thecurrent node. For example, the node having ID 144 can send a pingmessage to the node having ID 151. Upon receiving the first pingmessage, the node having ID 151 is made aware that the node having ID144 is a neighbor of the node having ID 151. Node 151 may also discovernewer liveness information (for other nodes on ring 1206) from node 144as a side effect of this act.

Ping messages can be periodically repeated at a specified frequencybased on, for example, configuration state associated with a proximalring into which the ping message is to be sent. The frequency can bevaried depending on the configuration state. For example a specifiedping frequency for a WAN can be different than the specified frequencyfor a LAN. Ping messages can also be sent in accordance with a pingdistribution pattern. The ping distribution pattern for an originatingnode can indicate that ping messages are to be sent to be neighborhoodnodes in both directions on a ring. For example, the node having ID 144can send pings both in the direction of the node having ID 135 and inthe direction of the node having ID 151. Ping distribution patterns andfrequencies can be varied. For example, per proximity ring.

Method 1400 includes an act of receiving a second ping message from theneighborhood node (act 1402). The second ping message indicates to thecurrent node at least that the neighborhood node originating the secondping message is a neighbor of the current node. The second ping messagecan also include routing partner and neighborhood nodes' state of theoriginating neighborhood node. For example, the node having ID 151 cansend a second ping message to the node having ID 144. Upon receiving thesecond ping message, the node having ID 144 is made aware that the nodehaving ID 151 is a neighbor of the node having ID 144. The second pingmessage can also include liveness information for other nodes on ring1206. Thus generally, ping messages can be exchanged within aneighborhood and can be used to maintain neighborhood membership (foreach proximal membership) and an approximated common neighborhood viewof node presence within the federation.

A received ping message can be periodically repeated/forwarded to othernodes within the proximal neighborhood into which the ping wasoriginated (sent by the originating node). Forwarded ping messages canalso be sent in accordance with a ping distribution pattern. The pingdistribution pattern for a forwarding node can indicate that pingmessages are to be sent to be neighborhood nodes in a direction awayfrom an originating node. For example, the node having ID 1151 canforward pings originating at the node having ID 144 in the direction ofthe node having ID 174. Ping forwarding distribution patterns can bevaried, for example, per proximity ring.

Nodes can be configured to receive ping messages at correspondingintervals. When expected ping messages are not received, a node mayinterpret a communications failure and set the phase.unknown indicationfor another node to true for the node that should have originated theexpected, but at least late, ping message.

Method 1400 includes an act of proximally routing an update requestmessage to a perfect routing node (act 1403). The update request messageindicates to the routing node receiving such a routed update requestthat the current node is participating as a routing partner of thereceiving routing node. The update request message can also include atleast the current node's neighborhood nodes' identities (e.g. in theform of liveness information). For example, the node having ID 144 canroute update message 1216 to the node having ID 208 (the perfect routingpartner offset by 64 from 144). Because node 210 (a previously computedrouting node) is closest to 208, it will receive and process the routedupdate request. Upon receiving update message 1216, the node having ID210 is made aware (or is reinforced) that the node having ID 144 is arouting partner of the node having ID 210.

Method 1400 includes an act of receiving an update response message fromthe processing (receiving) routing node (act 1404). The update responseindicates to the current node that the processing routing node isparticipating as a routing partner of the current node. The updateresponse message can also include at least the processing routingpartner's neighborhood nodes' identifies. For example, the node havingID 210 can send update response 1207 to the node having ID 144. Uponreceiving update response 1207, the node having ID 144 is made awarethat the node having ID 210 is a routing partner of the node having ID144.

Method 1400 can also include an act of appropriately updating nodeinformation to indicate that the current node and the neighborhood nodeare participating as neighbors and that the current node and theneighborhood node are participating as routing partners For example, thenode having ID 144 can update node information corresponding to the nodehaving ID 151 to indicate that the nodes having IDs 144 and 141 areparticipating in a (proximal) neighborhood. Similarly, the node havingID 144 can update node information corresponding to the node having ID210 to indicate that the nodes having IDs 144 and 210 are participatingas routing partners.

In some embodiments, application state saved at a specified node X isreplicated among its Neighborhood(X) nodes using reliable-floodingprotocol. Each item in the application state has an assigned owner,which could be the endpoint that created the item. Each item in theapplication state also has an associated timestamp (a.k.a. sequencenumber) given by its owner. The timestamp has at least three components:

-   -   Instance ID (e.g., an unsigned-integer) of the owning entity.        Must be at least monotonically (>1) increasing.    -   Sequence ID (e.g., a URI) identifying the particular sequence        generated by an owner. This component allows the same owner to        generate multiple independent sequences    -   Ordinal number (e.g., an unsigned-integer) identifying the        offset within the identified application sequence ID.

Item timestamps are used to detect latest information associated withthe corresponding item during replication because item timestampsgenerate at least a partial-order with <Instance ID, Sequence ID, andOffset>triples. The timestamp associated with an item being replicatedis compared against the local one, if any, to detect the latest one.Item timestamps are also used to support idempotent semantics ofcreate/update/delete operations. For example, when a node receives arequest to update an existing item in the application state, the updateis accepted only if the timestamp associated with the update request ishigher than the one associated with the local item. Conflict resolutiontechniques based on vector timestamps can be utilized where items cannotbe assigned a single owner. Application state replication providesfault-tolerance and facilitates load-balancing requests acrossneighborhood nodes.

As an optional behavior, Nodes not detecting (after a period of time) anexpected Update or Ping from (origin) other partner (routing and/orpartner) nodes can consider the phase-state unknown, set a phase.unknownindication to true, and report it as such to other 3^(rd) party nodes.In other words periodic generation of updates and pings can be required.This requirement and actual timeout values can be an attribute ofvarious proximal rings. For example, a ring can have more restrictivetiming requirements for some sub-rings (e.g., in a LAN segment) and nodefailure detection/reporting is relatively quick. On the other hand, aring can have less restrictive timing requirements (or even no timingrequirements) for other sub-rings (e.g., on the Internet) and proactivenode failure detection/reporting is relative long (or doesn't exist).

FIG. 15 illustrates an example flow chart of a method 1500 fordiscovering liveness information for another node. The method 1500 willbe described with respect to ring 1206 in FIGS. 12A and 12B. Generally,any message, such as, for example, sync 1203, sync response, 1204,update 1216, update response 1207, etc., can include at least oneliveness header. In some embodiments, a liveness header includes a <nodeID, instance ID, phase [phase-state value].[phase.unknown indication],freshness value, a color (proximity) value, and a weight value> for anode. In other embodiments, a liveness header includes <a phase[phase-state value].[phase.unknown indication], freshness value, a color(proximity) value, and a weight value>. In these other embodiments,liveness headers can be used to augment addressing headers that alreadyinclude node ID and instance ID for sender and origin nodes. Since theaddressing headers already include node ID and instance ID, thisinformation can be omitted from the liveness header.

Method 1500 includes an act of receiving a liveness header representingstate information for a node participating in a federationinfrastructure (act 1501). The liveness header includes at a least areceived participating node ID, a received node's instance ID, areceived phase value, and a received freshness value. For example, thenode having ID 144 can receive a first liveness header in sync response1204 from the node having ID 151. The first liveness header can includea <participating node ID, an instance ID, phase value [phase-statevalue].[phase.unknown indication], a freshness value, a color(proximity) value, and a weight value> for the node having ID 174. Thephase-state value (e.g., Inserting, Syncing, Routing, Operating)identifies the expressed phase of the node having ID 174 at the time ofthe first freshness value. The phase value (e.g., phase-state:[Inserting, Syncing, Routing, Operating], and phase.unknown) identifiesthe expressed and/or detected phase information of the node having ID174 at the time indicated by the first freshness value.

However, a freshness value can be discounted due to communication delay.A freshness value can also decay with the passage of time. The decaycurves for a freshness value can differ (and may not be linear orsymmetric) for the different phase states (including unknown). Thus,across different node phases, the decay of a freshness value can benon-linear and/or asymmetric.

Method 1500 includes an act of accessing at least a current instance ID,current phase value, and current freshness value for the participatingnode maintained at the current node (act 1502). For example, the nodehaving ID 144 can access a previous received and stored instance ID,phase value [phase-sate value.][phase.unknown indication], and freshnessvalue for the node having ID 174.

Method 1500 includes an act of comparing at least the received instanceID, received phase value, and received freshness value to the currentinstance ID, the current phase value, and the current freshness valuerespectively at a current node (act 1503). For example, the node havingID 144 can compare the previously received and stored instance ID, phasevalue [phase-sate value.][phase.unknown indication], and freshness valuefor the node having ID 174 to the instance ID, phase value [phase-satevalue.][phase.unknown indication], and freshness value received in theliveness header.

The node having ID 144 can determine that current state information forthe node having ID 174 (e.g., received from the node having ID 151) isstale based on (in order) the first instance ID being greater than thecurrently stored instance ID for the node having ID 174, based on firstphase-state value being more advanced than the currently storedphase-state value for the node having ID 174, or based on the firstfreshness value being a value greater than the freshness value currentlystored for the node having ID 174. The node having ID 144 can alsodetermine that at least one phase.unkown indication (either currentlystored or received in the liveness header) indicates that a phase-statewas known at the time the phase-state was detected/transmitted.

Method 1500 includes an act of determining if state information for theparticipating node is to be updated at the current node based on thecomparison (act 1504). For example, based on the comparison of valuesfor the node having ID 174, the node having ID 144 can determine thatstate information for the node having ID 174 is to be updated. Updatingoutdated state information for the node having ID 174 can includereplacing current stored values (e.g., for instance ID, phase-statevalue, phase.unknown indication, or freshness value) with valuesincluded in the liveness header. For example, the node having ID 144 canupdate state information for the node having ID 174 to indicate that thenode having ID 174 has transitioned to a more advanced phase-state.

In some embodiments, it can be detected that communication with theparticipating node may have been lost. For example, the node having ID144 can detect that communication with the node having ID 151 has beenlost. Referring briefly to FIG. 17, in response to a prior subscriptionfor liveness events 1701 (with an endpoint of the node having ID 151),application layer 1752 can send endpoint down event 1703 (with anendpoint of the node having ID 151) to function layer 1751. In theseembodiments such detected liveness conditions can be indicated inliveness information with the Phase.Unknown indicator being set to truealong with the last known Phase state value.

Method 1500 can further include an act of receiving a message thatincludes a second liveness header from a second different node in thefederation infrastructure For example, the node having ID 144 canreceive a status message (from the node having ID 103 or some other nodeof ring 1206) that includes a second liveness header. The secondliveness header can include <the participating node ID, a secondinstance ID, a second phase value [phase-state value].[phase.unknownindication], a second freshness value, a second color (proximity) value,and a second weight value> for the node having ID 174. The second phasevalue (e.g., phase-state: [Inserting, Syncing, Routing, Operating], andphase.unknown indication) identifies the expressed/detected phase of thenode having ID 174 at the time of the second freshness value.

Alternately, subsequent to receiving the first liveness header, the nodehaving ID 144 can attempt to communicate directly with the node havingID 174. If communication is successful, the node having ID 174 canreturn a message (e.g., sync response) having the node ID and secondinstance ID in an addressing header and having a liveness headerincluding <the second phase value, the second freshness value, thesecond color (proximity) value, and the second weight value>. If afailure is detected, the node having ID 144 generates an internalliveness state change (e.g. freshness=max, and phase.unknownindication=true) and processes the state change as if the state changewere received from another node. Such a state change has highestfreshness value.

Method 1500 can also include an act of comparing the second instance ID,the second phase value, and the second freshness value to the currentinstance ID, the current phase value, and the current freshness valuerespectively (act 1506). For example, after receiving a status messagefrom the node having ID 103, the node having ID 144 can determine thatcurrent state information for the node having ID 151 is stale based on(in order) the second instance ID being greater than the first instanceID, the second phase being more advanced than the first phase value, orthe second freshness value being greater than the first phase value.

Method 1500 can also includes an act of determining if state informationfor the participating node is to be updated based on the comparison. Forexample, based on the comparison of values for the node having ID 174,the node having ID 144 can determine that state information for the nodehaving ID 174 is to be updated. Updating outdated state information forthe node having ID 174 can include replacing current stored values(e.g., for instance ID, phase-state value, phase.unknown indication, orfreshness value) with values included in the second liveness header. Forexample, the node having ID 144 can update state information for thenode having ID 174 to indicate that the node having ID 174 hastransitioned to a more advanced phase-state.

In some embodiments, phase values are compared within the context ofequal color values. As previously described, a node can participate inmultiple proximity rings. Participation in multiple proximity rings canoccur as a result of participation in a more specific ring implyingparticipation in a more general ring (along a common spine). Forexample, referring back to FIG. 5, a node's participation in ring 532also implies that the node is participating in rings 522, 511, and 501.Thus, a color for a more specific ring also represents all parentproximal rings. Also as previously described, participation in multipleproximity rings can occur when a node in one ring is aliased into one ormore other rings (potentially along different spines). For example,still referring to FIG. 5, a node participating in ring 532 can bealiased into ring 531 (or even ring 541 that would imply participationin rings 531, 522, 511, and 501). Thus, a color for one ring (e.g., ring531) can be viewed as a peer color (or proximity) of another ring (e.g.,ring 532).

When a node participates in a plurality of proximity rings in an aliasedfashion, there is some potential that phase values (e.g., phase-statevalues and/or phase.unknown indications) for the node will differbetween different proximity rings. Thus, a node that receives stateinformation for another node, identifies the corresponding proximityring for the state information (color) before determining if currentstate information is to be updated for that node and color. For example,the node having ID 144 can identify the corresponding proximity ring forreceived state information corresponding to the node having ID 174before comparing the received state information to current stateinformation.

Identifying an appropriate proximity ring can include comparing areceived color value to one or more current color values. When thereceived color value and a current color value are equal , other stateinformation, such as, for example, a current instance ID, a currentphase value, and a current freshness value, can be compared tocorresponding received state information, such as, for example, areceived instance ID, a received phase value, and a received freshnessvalue. On the other hand, when the received color value and a currentcolor value differ, further comparisons do not occur.

Equality between color values can result in a variety of ways. Forexample, equality between color values can result when a current colorvalue and a received color value indicate the same proximity ring (e.g.,ring 532). Further, equality between color values can result when a morespecific color value is compared to a corresponding parent color value(e.g., another ring along the same spine). For example, comparing thecolor value for ring 532 to the color value for ring 511 (or ring 522 or501) can result in equality. Thus, the child proximity is the parentproximity but is more specific.

Thus generally, currently operational nodes in a federationinfrastructure can exchange expressed and detected liveness stateinformation for other nodes even when communication with those othernodes appears to be lost.

Bootstrapping Mechanisms

Generally, in order for a node to become an active member of afederation (e.g., join), the node has to communicate with at least oneother node that is already an active member of the leaf ring it intendsto join. To help insure this initial form of communication is available,federations can utilize a bootstrapping mechanism. A bootstrappingmechanism can be used as a last resort when other types of communicationfail to identify an active member of a leaf ring or security constraintsrequire a newly joining node to initially communicate with at least oneof a set of special nodes such as seed nodes. That is when other typesof communication fail or because of security requirements, abootstrapping mechanism can be used to identify an active member node ofa leaf ring.

In some embodiments, seed nodes are used to bootstrap communication witha federation. Seed nodes provide well known points of entry for sometypes of cross (inter) proximity communication. Seed nodes help healring partitions due to infrastructure failure/recovery and generaldynamism. Each ring can have at least one operational seed node in orderto provide basic bootstrapping properties for a federation.

Peer seed nodes can communicate amongst themselves to maintain a ringstructure (e.g., a doubly linked list) for a proximity that consists ofat least all active seed nodes for that proximity. A dedicated seed nodesynchronization protocol can be used to provide each seed node with atleast total knowledge of all other seed nodes' presence (active) state.An active seed node is a member node of the proximity leaf ring in whichit is homed as well as all other ancestral rings of the leaf ring. Thus,a seed node can represent an entire spine of proximity rings, forexample, from the seed node's leaf ring to the root ring. Accordingly,seed nodes can function as highly available and well known entry nodesin each of those proximity rings. As a result, presence state about seednodes can be useful for various forms of communication (e.g.,inter-proximal communication) within a federation. Accordingly, seednodes can provide a number of special properties, such as, for example,acting as well known “join points” for joining nodes, acting as a securering authority, aiding in healing infrastructure partitions, and actingas a stable “entry node” for each of their proximities.

To provide presence data, a seed node's arrivals and orderly departurescan be registered as a stable entry node at a rendezvous point in eachof their proximities. For example, registration messages can be routedto a fixed URI whose destination ID is the SHA-1 hash of the string“Proximity:/”. While in one embodiment seed nodes acting as stable entrynodes register themselves in this manner there are other embodimentswhere selected non-seed nodes may also register themselves in the samemanner and with the same or similar protocols described here for seednode. When a stable entry node (such as a seed node) registers, thestable entry node can indicate each ring it is a member of. Thus,information maintained at the rendezvous point identified by this fixedURI is essentially a list of stable entry nodes and their correspondingring memberships. Accordingly, any node can refer to the rendezvouspoint identified by this fixed URI to obtain a list of available stableentry nodes and their ring memberships.

In one embodiment the stable entry node directly registers these arrivaland departure events. In another embodiment, the stable entry noderegisters these events directly at a rendezvous point within it'simmediate proximity ring and that rendezvous point transparentlyfacilitates (directly or indirectly) updating of all other appropriaterendezvous points in each of the remaining proximities rings to whichthe registering/unregistering stable entry node belongs. The applicationstate sequencing and propagation properties of a federation can be usedto maintain and propagate this stable entry node registrationinformation. For example, a reliable-flooding protocol can be used toreplicate saved application state among a node's Neighborhood nodes.

The promotion of a stable entry node's presence data towards the rootring allows other nodes in a federation to look up at least one entrynode in every proximity. Entry Node Lookup can be facilitated by routinga node lookup message towards the above determined rendezvous point inthe Lowest Common Ancestor Ring (“LCAR”) of the leaf ring of the nodeperforming the lookup and the desired proximity ring. For example,referring to FIG. 5, a node in ring 541 may desire to communication witha node in ring 533. However, the node in ring 541 may have no directknowledge of any node in ring 533. Thus, the node in ring 541 can send aNode Lookup Message to ring 522 (the LCAR of ring of ring 541 and ring533). A rendezvous point node in ring 522 that processes entry nodepresence information (e.g. caused to exist in the system because of aregistration message originated by that entry node) can return a LookupResponse Message with contact information for at least a registeredstable entry node in ring 533.

In some embodiments, stable entry nodes are seed nodes configuredspecifically as stable entry nodes for maintaining presence data forvarious proximities. In other embodiments, other types of nodes can alsofunction as stable entry nodes maintaining presence data for variousproximities and may also be configured to perform other operations. Forexample, certain other types of nodes may be configured (e.g., by anadministrator) as being highly available and thus suitable as a stableentry node (i.e. to be registered as described above). However, theother types of nodes may not include additional seed node functionality(e.g., may not be trusted as a security ring authority). In someembodiments, rendezvous points that maintain entry node presence statefor their immediate proximity may register themselves as a stable entrynode in the ancestral ring or rings.

Node Monitoring

FIG. 19A illustrates an example ring architecture 1900 that facilitatesone node monitoring another node. As depicted, ring architectureincludes at least ring 1901 (and any number of other higher and/or lowerlevel rings (not shown)). Ring 1901 can be configured similar to ring306 of FIG. 3. However, monitoring can occur on any ring of nodes,including any of the rings in FIGS. 4, 5, 12A, and 12B. FIG. 19A depictsan expanded view of subject node 1902 (having ID=83) and monitor node1903 (having ID=2). In the depicted embodiment, monitor node 1903 is tomonitor subject node 1902. However, any node on ring 1901 can beconfigured to monitor any other node on ring 1901.

FIG. 20 illustrates an example flow chart of a method 2000 for one nodeto monitor another node. The method 2000 will be described with respectto the data and components depicted in FIG. 19A.

Method 2000 includes an act of the subject node generating a subjectside time-to-live duration value for use in monitoring of the subjectnode (act 2001). For example, subject node 1902 can establishtime-to-live (TTL) duration value 1913. TTL duration value 1913indicates a duration for which subject node 1902 can assume a monitoringrelationship with monitor node 1903 is active.

Method 2000 includes an act of the subject node sending an establishrequest to the monitor node, the establish request indicative of thesubject node requesting that the monitor node monitor the subject node,the establish request including the subject side time-to-live durationvalue (act 2002). For example, subject node 1902 can send establishrequest 1922, including TTL value duration 1913, to monitor node 1903.

Method 2000 includes an act the subject node establishing an existingsubject side time-to-die time based on the subject side time-to-liveduration value and the time the establish request was sent, wherein thesubject node clock reaching the existing subject side time-to-die time,prior to receiving an establish grant from the monitor node, is anindication of the subject node having to transition to a failure state(act 2003). For example, subject node 1902 can establish subject sidetime-to-die time 1917 based on TTL duration value 1913 and the time theestablish request 1922 was sent to monitor node 1903. Subject sidetime-to-die time 1917 can be a time relative to subject node 1902. If aclock of subject node 1902 reaches subject side time-to-die time 1917,prior to receiving an establish grant form monitor node 1903, subjectnode 1902 is to transition to a failure state. In some embodiments, whena clock of subject node 1902 reaches subject side time-to-die time 1917,prior to receiving an establish grant from monitor node 1903, a failurestate is caused. In other embodiments, other activities occur totransition subject node 1902 into a failure state.

Method 2000 includes an act of the monitor node receiving the establishrequest from the subject node, the establish request indicative of thesubject node requesting that the monitor node monitor the subject node,the establish request including at least the subject side time-to-liveduration value, the subject side time-to-live duration value used todetermine a subject side time-to-die time at the subject node, whereinthe subject node clock reaching the subject side time-to-die time, priorto receiving an establish grant from the monitor node, is an indicationof the subject node having to transition to a failure state (act 2004).For example, monitor node 1903 can receive establish request 1922,including TTL duration value 1913, from subject node 1902. TTL durationvalue 1913 having been used at subject node 1902 to establish subjectside time-to-die time 1917.

Method 2000 includes an act of the monitor node deriving a monitor sidetime-to-live duration value from the subject side time-to-live durationvalue (act 2005). For example, monitor node 1902 can use TTL valueduration 1913 to derive TTL duration value 1919. In some embodiments,monitor node 1903 copies TTL duration value 1913 to derive TTL durationvalue 1919. In these embodiments, TTL value duration 1913 and TTL valueduration 1919 are equal. In other embodiments, monitor node 1903modifies TTL duration value 1913 to derive TTL duration value 1919. Inthese other embodiments TTL duration value 1913 and TTL duration value1919 differ. For example, monitor node 1903 can increase the value ofTTL duration value 1913 to derive TTL duration value 1919 such that TTLduration value 1919 is larger than TTL duration value 1913.

Method 2000 includes an act of the monitor node establishing a monitorside time-to-die time based on the monitor side time-to-live durationvalue and the time the establish request was received, the monitor nodeclock reaching the monitor side time-to-die time, prior to receiving arenew request from the subject node, being indicative of a suspectedfailure of the subject node (act 2006). For example, monitor node 1903monitor side time-to-die time 1914 based on TTL duration value 1919 andthe time establish request 1922 was received. Monitor side time-to-dietime 1914 can be a time relative to monitor node 1903. If a clock ofmonitor node 1903 reaches monitor side time-to-die time 1914, prior toreceiving a renew request from subject node 1902, monitor node 1903suspects subject node 1902 of failure.

Method 2000 includes an act of the monitor node sending an establishgrant to the subject node to indicate to the subject node that themonitor node has agreed to monitor the subject node (act 2007). Forexample, monitor node 1903 can send establish grant 1923 to subject node1902. Method 2000 includes an act of the subject node receiving anestablish grant from the monitor node, the establish grant indicative ofthe monitor node monitoring the subject node (act 2008). For example,subject node 1902 can receive establish grant 1923 from monitor node1903. Generally, establish grant 1923 indicates that monitor node 1903has agreed to monitor subject node 1902. In some embodiments, theestablish grant message can include the monitor side TTL duration value.For example, it may be establish grant 1923 includes TTL duration value1919.

Alternately, a monitor node can send an establish reject to a subjectnode to indicate to the subject node that the monitor node has notagreed to monitor the subject node. For example, in response toreceiving establish request 1922, monitor node 1903 can alternately (asindicated by the dashed line) send establish reject 1931 to subject node1902. A subject node can receive an establish reject sent from a monitornode. For example, subject node 1902 can receive establish reject 1931from monitor mode 1903. Establish reject 1931 generally indicates tosubject node 1902 that monitor node 1903 has not agreed to monitorsubject node 1902.

From time to time (and intermingled between the performance of otheroperations within ring architecture 1990), a subject node can renew anestablished monitoring agreement with a monitor node. Generally, thesubject node leaves the existing agreement in force (the currentsubject-side-time to die time) until a new grant is received. However,the subject node can generate a new TTL duration value and derive whatan updated time-to-die time would be. The subject node then sends thenew TTL duration value to the monitor node. The monitor node receivesthe new TTL duration value. When appropriate the monitor node grants therenew request and sends a renew grant back to the subject. The subjectnode receives the renew grant. In response to receiving the renew grantthe subject implements the renewed agreement using the updatedtime-to-die time as the new current time-to-die time. An example ofrenewing an established monitoring agreement is described in theremaining acts of method 2000.

Method 2000 includes an act the subject node sending a renew request tothe monitor node prior to the subject node clock reaching the subjectside time-to-die time (act 2009). For example, subject node 1902 cansend renew request 1915, including TTL duration value 1913, to monitornode 1903 prior to a clock of subject node 1902 reaching subject sidetime-to-die time 1917. In some embodiments, renew request 1915 does notinclude a subject side TTL duration value. In these embodiments,continued use of TTL duration value 1913 can be inferred. In otherembodiments, TTL duration value 1913 is expressly included in renewrequest 1915. In yet other embodiments, a different subject side TTLduration value is included in renew request 1915. A new subject side TTLduration value can be generated and used by subject node 1902 inresponse to configuration changes of subject node 1902 and/or toconfiguration changes else where in ring 1901 (e.g., changed networkconditions).

Node 1902 can also calculate what an updated subject side time-to-dietime is to be if a corresponding renew grant responsive to renew request1915 is received. The calculation can be based at least on the timerenew request 1915 was sent and on the subject side TTL duration valuerelated to or associated with renew request 1915.

Method 2000 includes an act of the monitor node receiving a renewrequest from the subject node subsequent to sending the establish grantmessage and prior to the monitor node clock reaching the monitor sidetime-to-die time, the renew request indicating that the subject node hasnot failed (act 2010). For example, monitor node 1903 can receive renewrequest 1915 subsequent to sending establish grant 1923 and prior to aclock of monitor node 1903 reaching monitor side time-to-die time 1914.Reception of renew request 1915 can indicate to monitor node 1903 thatsubject node 1902 has not failed.

Method 2000 can also include an act of the monitor node granting therenew request to the subject node. For example, monitor node 1903 cangrant renew request 1915.

Method 2000 includes an act of the monitor node establishing an updatedmonitor side time-to-die time in response to and based at least on thetime the renew request was received, the monitor node clock reaching theupdated monitor side time-to-die time, prior to receiving another renewrequest from the subject node, being indicative of a suspected failureof the subject node (act 2012). For example, monitor node 1903 canestablish updated monitor side time-to-die time 1921 in response to andbased on the time renew request 1915 was received and the implied orindicated monitor TTL duration value related to or potentially containedin a renew request 1915. Updated monitor side time-to-die time 1921 canbe a time relative to monitor node 1903. Updated monitor sidetime-to-die time 1921 can be subsequent to monitor side time-to-die time1914. However, there is no requirement that monitor side time-to-dietime 1914 have occurred before establishing updated monitor sidetime-to-die time 1921. Thus, it is also possible that updated monitorside time-to-die time 1921 is in fact prior to (or the same as) monitorside time-to-die time 1914. If a clock of monitor node 1903 reachesupdated monitor side time-to-die time 1921, prior to receiving anotherrenew request from subject node 1902, monitor node 1903 suspects subjectnode 1902 of failure.

If no subject side TTL duration value is included in renew request 1915(and thus TTL duration value 1913 is inferred) or if renew requestexpressly includes TTL duration value 1913, monitor node 1903 can alsouse TTL duration value 1919 to establish updated monitor sidetime-to-die time 1921. On the other hand, if a subject side TTL durationvalue other than TTL duration value 1913 is expressly included in renewrequest 1915, monitor node 1903 can use the other expressly includedsubject side TTL duration value to derive a new monitor side TTLduration value. From the new monitor side TTL duration value, monitornode 1903 can then establish updated monitor side time-to-die time 1921.

Method 2000 includes an act of the monitor node sending a renew grant tothe subject node to indicate to the subject node that the monitor nodehas agreed to continue monitoring the subject node (act 2013). Forexample, monitor node 1903 can send renew grant 1927 to subject node1902. Method 2000 includes an act of the subject node receiving a renewgrant from the monitor node subsequent to sending the correspondingrenew request and prior to the subject node clock reaching the subjectside time-to-die time, the renew grant message indicative of the monitornode continuing to monitor the subject node (act 2014). For example,subject node 1902 can receive renew grant 1927 from monitor node 1903subsequent to sending renew request 1915 and prior to a clock at subjectnode 1902 reaching subject side time-to-die time 1917. Generally, renewgrant 1927 is indicative of monitor node 1903 agreeing to continue tomonitor subject node 1902.

Alternately, a monitor node can send a renew reject to a subject node toindicate to the subject node that the monitor node is no longer agreeingto monitor the subject node. For example, in response to receiving renewrequest 1915, monitor node 1903 can alternately (as indicated by thedashed line) send renew reject 1933 to subject node 1902. A subject nodecan receive a renew reject sent from a monitor node. For example,subject node 1902 can receive renew reject 1933 from monitor mode 1903.Renew reject 1931 generally indicates to subject node 1902 that monitornode 1903 is no longer agreeing to monitor subject node 1902.

Method 2000 includes the subject node transitioning to a previouslycalculated updated subject side time-to-die time in response toreceiving the renew grant, wherein the subject node clock reaching theupdated subject side time-to-die time, prior to receiving another renewgrant from the monitor node, is an indication of the subject node havingto transition to a failure state (act 2015). For example, subject node1902 can transition to updated subject side time-to-die time 1952 whenthe corresponding renew grant message is received. Updated subject sidetime-to-die time 1952 can have been calculated at around the time renewrequest 1915 was sent to monitor node 1903. Updated subject sidetime-to-die time 1952 can have been calculated based on the timecorresponding renew request 1915 was sent and on the TTL durationrelated to or associated with renew request 1915. Updated subject sidetime-to-die time 1952 can be a time (e.g., subsequent, prior, or equalto subject side time-to-die time 1917) relative to subject node 1902.

If TTL value 1913 is still the appropriate TTL value, subject node 9102can also used TTL duration value 1913 to establish updated subject sidetime-to-die time 1952. If another TTL duration value has been generated,subject node 1902 can also use the other generated TTL duration value toestablish updated subject side time-to-die time 1952.

Subsequent, to establishment of a current subject side time-to-die time(either 1917 or 1952), it may be a clock at subject node 1902 reachesthe current subject side time-to-die time prior to receiving anotherrenew grant from monitor node 1903. This may result from communicationerrors between subject node 1902 and monitor node 1903. For example,subject node 1902 may send another renew request subsequent receivingrenew grant 1927 and prior to a clock of subject node 1902 reachingupdated subject side time-to-die time 1952. However, due tocommunication failures the other renew request does not reach monitornode 1903. Alternately, the other renew request may be received atmonitor node 1903, but the corresponding renew grant from monitor node1903 does not reach subject node 1902 due to communication errors. Ineither event, a clock at subject node 1902 may reach a current subjectside time-to-die time prior to receiving the corresponding renew grantresponsive to the other renew request.

Alternately, subject node 1902 can malfunctioning such that subject node1902 is prevented from sending another renew request to monitor node1903 prior to a clock at subject node 1902 reaching t updated subjectside time-to-die time 1952.

However, whether or not a renew request is sent, if a renew grant is notreceived prior to a clock at subject node 1902 reaching a currentsubject side time-to-die time 1952, subject node 1902 transitions into afailure state.

Referring back to monitor node 1903, it may be that a clock at monitornode 1903 reaches a monitor side time-to-die time (either 1914 or 1921)prior to receiving another renew request from subject node 1902 (eitherdue to a malfunction at subject node 1902 or to communication errors inring 1901). As a result, monitor node 1903 suspects subject node 1902 offailure. Monitoring node 1903 can transition to a timeout stateindicative of detecting a suspected failure at another node.

In other embodiments a pair of nodes can monitor each other. Thus, afirst node can monitor a second node and the second node can alsomonitor the first node. For example, each node can implement both thesubject node side and the monitor node side method 2000 throughcommunication with the other node.

FIG. 19B illustrates an example ring architecture 1900 that facilitatestwo nodes monitoring each other.

Node 1971 can generate TTL duration value 1929 for use in monitoringnode 1971. Node 1971 can send establish request 1962, including TTLduration value 1929, to node 1972. Node 1971 can also establish subjectside time-to-die time 1973 based on TTL duration value 1929. Node 1972can receive establish request 1962, including TTL duration value 1929,from node 1971. Node 1972 can derive TLL duration value 1949 from TTLduration value 1929. Node 1972 can establish monitor side time-to-dietime 1939 based on TTL duration value 1949. Node 1972 can send establishgrant 1974 to node 1971. Node 1971 can receive establish grant 1974 fromnode 1972.

In parallel, node 1972 can generate TTL duration value 1975 for use inmonitoring node 1972. Node 1972 can send establish request 1926,including TTL duration value 1975, to node 1971. Node 1972 can alsoestablish subject side time-to-die time 1935 based on TTL duration value1975. Node 1971 can receive establish request 1926, including TTLduration value 1975, from node 1972. Node 1971 can derive TLL durationvalue 1953 from TTL duration value 1933. Node 1971 can establish monitorside time-to-die time 1937 based on TTL duration value 1953. Node 1971can send grant message 1976 to node 1972. Node 1972 can receive grantmessage 1976 from node 1971.

Alternately, either of nodes 1971 and 1972 reject an establish requestfrom the other node. For example, node 1971 can reject establish request1962. Likewise, node 1972 can reject establish request 1926. When eithernode rejects an establish request, it can send an establish reject(e.g., similar to establish reject 1931) to the other node. Thisindicates to the other node that no monitoring agreement has beenestablished.

Node 1971 and node 1972 can then exchange renew requests and renewgrants (as well as renew rejects similar to renew reject 1933) aspreviously described. Accordingly, each of node 1971 and node 1972 areboth a subject node and a monitor node. Based on the depicted TTLduration values and time-to-die times in FIG. 19B, various events mayoccur during and/or after the monitor relationships are established

If a clock at node 1971 reaches subject side time-to-die time 1973 priorto receiving a renew grant from node 1972, node 1971 transitions to afailure state. If a clock at node 1972 reaches monitor side time-to-dietime 1939 prior to receiving a renew request from node 1971, node 1972suspects node 1971 of failure.

If a clock at node 1972 reaches subject side time-to-die time 1935 priorto receiving a renew grant from node 1971, node 1972 transitions to afailure state. If a clock at node 1971 reaches monitor side time-to-dietime 1937 prior to receiving a renew request from node 1972, node 1971suspects node 1972 of failure.

Arbitration Of Node Failures

Due to various different types of communication errors and nodemalfunctions, there exists some possibility that each node in a pair ofnodes will suspect failure of the other node. Further, each node maysuspect that it is functioning properly.

In some ring architectures, portions of resources are configured suchthat a single node controls a resource at a given moment in time.Further, the needed availability of some resources may also be high suchthat essentially constant control by a node is required. Thus, when anode fails, control of various resources may need to be transferred toanother node. Accordingly, when a node in a pair of nodes suspects theother node of failure, arbitration mechanisms can be used to determineat least which node has or should fail.

For example, when each node in a pair nodes suspects the other node offailing, each node can transition to a timeout state and report theirsuspicion to an arbitration facility. When in a timeout state, certainother processing at each node can be suspended until the results of thearbitration are received. The arbitration facility can report back to anode indicating if it is to remain active. For example, an arbitrationfacility can send an accept message to a reporting node that is toremain active. The arbitration facility can send a deny message to areporting node that is to transition to a failure state. A node thatreceives an accept message can remain active. A node that doesn't notreceive an accept message (e.g., due to network conditions) or thatreceives a deny message transitions to a failure state.

FIG. 19C illustrates example ring architecture 1900 that facilitatesarbitration when mutually monitoring nodes each can report that theother node is suspected of failing. FIG. 19C depicts an expanded view ofnode 1981 (having ID=98), monitor node 1982 (having ID=64), andarbitrator 1983.

In some embodiments, arbitrator 1983 is also a member of ring 1901. Inother embodiments, arbitrator 1983 is a member of an ancestor ring ofring 1901 but is not member of ring 1901. In further embodiments,arbitrator 1983 is external to the ring hierarchy that includes ring1901. For example, arbitrator 1983 can be included in a separatearbitration federation ring of nodes. Nodes in the arbitrationfederation can be configured as arbitrators for the nodes of ring 1901and its ancestors.

In some embodiments, arbitrator 1983 is mutually agreed to by node 1971and node 1982 to arbitrate for nodes 1981 and 1982. In otherembodiments, arbitrator 1983 is assigned to arbitrate for nodes 1981 and1982 by another entity. The other entity can be a node internal to thering hierarchy including ring 1901 (e.g., a seed node) or, for example,a human administrator. For example, the other node can be a member ofring 1901 or a member of an ancestor ring of ring 1901 but not member ofring 1901. Alternately, the other entity can be external the ringhierarchy including ring 1901. For example, the other entity can be anode that is a member of separate arbitration federation ring.

Arbitrator 1983 can have varying knowledge of the ring hierarchyincluding ring 1901. For example, arbitrator 1983 can have globalknowledge of the ring hierarchy including ring 1901. Alternately,arbitrator 1983 can have knowledge of some subset of rings included thering hierarchy including ring 1901. In other embodiments, arbitrator1983 has knowledge of a subset of nodes in ring 1901 including (andpotentially only) nodes 1981 and 1982.

Arbitrator 1983 can be configured to arbitrate for any number of nodepairs including, but not limited to, nodes 1981 and 1982. In someembodiments, an arbitration mechanism has no knowledge of nodes it is toarbitrate for prior to receiving a report of a suspected node failure.Thus, although a pair of nodes have agreed to use arbitrator 1983 orarbitrator 1983 has been assigned to arbitrate for a pair of nodes,arbitrator 1983 may be unaware of any agreement or assignment prior toreceiving a repot of a suspected node failure for a node in the pair ofnodes.

Arbitration can include arbitrating between nodes that presentconflicting failure reports. For example, when a first node ismonitoring a second node and the second node is also monitoring thefirst node, it may be that each node reports that the other node issuspected of failure. The suspected failure can be detected usingvirtually any failure detection mechanisms including those previouslydescribed in this document.

Failed node list 1947 can include a list of nodes that have beenreported as suspected failed nodes. Nodes can be report other nodes assuspected failed nodes to arbitrator 1983 and, when appropriate,arbitrator 1983 can include the reported nodes in failed node list 1947.Arbitrator 1983 can remove failed nodes from failed node list 1947 afterappropriate periods of time (e.g., at a future time when the likelihoodof continued conflict is not possible). For example, entries in failednode list 1947 can be removed at recovery time interval 1942 after theywere inserted into failed node list 1947. Recovery time interval 1942can be long enough to insure that nodes that have been told to fail dofail.

FIG. 21 illustrates an example flow chart of a method 2100 forarbitrating between conflicting reports of suspected node failures. Themethod 2100 will be described with respect to the components and datadepicted in FIG. 19C.

Method 2100 includes an act of a first node sending a report to anarbitration facility that a second node is suspected of failing (act2101). For example, node 1981 can send report 1934 to arbitrator 1983.Method 2100 includes an act of an arbitrator receiving a report from thefirst node that the second node is suspected of failing (act 2102). Forexample, arbitrator 1983 can receive report 1934 from node 1981.

Method 2100 includes an act of the arbitrator determining that no othernode has suspected the first node of failing within a specified recoverytime interval prior to receiving the report from the first node (act2103). For example, arbitrator 1983 can determine that no other node hassuspected node 1981 of failing within recovery time interval 1942 (afterwhich arbitrator 1983 would have removed node 1981 from failed node list1947 anyway).

Method 2100 includes an act of the arbitrator recording in a list thatthe second node is in a failure state (act 2105). For example,arbitrator 1983 can record in failed node list 1947 that node 1982(ID=64) is in a failure state.

Method 2100 includes an act of the arbitrator sending an accept messageto the first node within a maximum response time interval, the acceptmessage including a failure time value indicative of a time period afterwhich the second node is guaranteed to transition into a failure state(act 2104). For example, arbitrator 1983 can send accept message 1984 tonode 1981 within maximum response time interval 1943 of receiving report1934. Accept message 1984 includes failure time interval 1936 indicativeof a time when node 1982 is guaranteed to have transitioned into afailure state. Generally, a maximum response time interval represents apoint in time after which a requestor (e.g., node 1981 or 1982) assumesthe an arbitration facility (arbitrator 1983) will not answer a requestfor arbitration (e.g., report 1934 or 1938). When a maximum responsetime interval expires at a requestor subsequent to sending a request forarbitration, the requestor performs similar (and potentially identical)operations to those that would be performed if an express deny messagewas received.

Method 2100 includes an act of the first node receiving an acceptmessage from the arbitration facility within a maximum response timeinterval, the accept message including a time value indicative of a timeperiod after which the second node is guaranteed to transition into afailure state (act 2106). For example, node 1981 can receive acceptmessage 1984, including failure time interval 1936, from arbitrator1983. Failure time interval 1936 is indicative of a time when node 1982is guaranteed to have transitioned into a failure state. Thus, after theexpiration of failure time interval 1936, node 1981 can attempt to claimcontrol of one or more ring resources previously controlled by node1982.

Method 2100 includes an act of the first node claiming control of one ormore ring resources previously controlled by the second node subsequentto expiration of the time period (act 2107). For example, node 1981 canclaim control of one or more ring resources within ring 1901 previouslycontrolled by the node 1982 subsequent to expiration of failure timeinterval 1936.

Claimed ring resources can vary depending on the ring resourcescontrolled by node 1982 prior to transition to a failure state. Forexample, node 1981 can assume message routing responsibilities of node1982 (e.g., the responsibility to receive messages directed to a rangeof identifies on ring 1901), any seed node responsibilities of node1982, any arbitration responsibilities of node 1982, etc.

At some time at or after the first node reports the second node, thesecond node may also suspect the first node of failure. For example, itmay be that node 1982 also suspects node 1981 of failure.

Method 2100 includes an act of the second node sending a report to thearbitration facility that the first node is suspected of failing (act2108). For example, node 1982 can send report 1938 to arbitrator 1983that node 1981 is suspected of failure. Method 2100 includes an act thearbitrator receiving a report from the second node that the first nodeis suspected of failing, the report from the second node received withinthe specified recovery time interval subsequent to receiving the reportfrom the first node (act 2109). For example, arbitrator 1983 can receivereport 1938 from node 1982 that node 1981 is suspected of failure withinrecovery time interval 1942 of receiving report 1934.

Method 2100 includes an act of the arbitrator referring to the list todetermine that the second node is to transition to a failure state (act2110). For example, arbitrator 1983 can refer to failed node list 1947to determine that node 1982 (ID=64) is to transition to a failure state.

Method 2100 includes an act of sending a deny message to the second nodeto cause the second node to transition into a failure state (act 2111).For example, arbitrator 1983 can send deny message 1985 to node 1982 tocause node 1982 to transition to a failure state. Method 2100 includesan act of the second node receiving a deny message from the arbitrationfacility (act 2112). For example, node 1982 can receive deny message1985 from arbitrator 1983.

Method 2100 includes an act of the second node transitioning into afailure state (act 2113). For example, node 1982 can transition into afailure state in response to receiving deny message 1985. After failing,node 1982 can subsequently attempt to rejoin ring 1901.

Routing in Accordance with Cached Agreements

In some embodiments, messages are routed in accordance with cachedrouting agreements. For example, adjacent nodes of a ring can agree to adivision of responsibility for a range of unoccupied identifiers betweenthe adjacent nodes. An identifier can be unoccupied for any number ofreasons. For example, an identifier may be unoccupied because theidentifier is unassigned (i.e., the identifier that has not beenassigned to a node). For assigned identifiers (i.e., identifiers thathave been assigned to a node), an identifier may be unoccupied becausethe corresponding node has been deliberately shutdown or the node is forsome reason, such as, for example, due to communication or nodefailures, otherwise unreachable.

Routing agreements between nodes can be established and cached prior tonodes being permitted to accept messages for and deliver messages forany of the unoccupied identifiers that are to be the responsibility ofthe adjacent nodes. Reference to a cached routing agreementsignificantly reduces any communication between (potentially) adjacentnodes that may otherwise occur to determine which node is responsiblefor a specific unoccupied identifier.

A cached routing agreement can divide a range of unoccupied identifiersin an arbitrary fashion, in accordance with configurable rules, or inaccordance with a fixed methodology. In some embodiments, a range ofidentifiers between adjacent nodes on a ring is divided essentially inhalf. This reduces the likelihood of an unoccupied identifier beingfurther from a node that is responsible for the unoccupied identifier.

When there is an even number of unoccupied identifiers between adjacentnodes, the midway point between the adjacent nodes is between unoccupiedidentifiers. Thus, responsibility for the unoccupied identifiers can bedivided at the midway point between the adjacent nodes. Accordingly,each adjacent node can be assigned responsibility for an equal number ofunoccupied identifiers.

On the other hand, when there is an odd number of unoccupied identifiersbetween adjacent nodes, the midway point between the adjacent nodes isat an unoccupied identifier. Thus, responsibility for the unoccupiedidentifiers can be divided at one side or the other of the unoccupiedidentifier that is the midway point. Accordingly, one adjacent node canbe assigned responsibility for one more unoccupied identifier than theother adjacent node.

For example, referring now to FIG. 22A, FIG. 22A illustrates an examplering architecture 2200 that facilitates routing a message in accordancewith a cached two-way agreement between nodes. As depicted, variousnodes (shown as squares on ring 2250) including (but not limited to)nodes 2201, 2202, 2203, 2261, 2262, and 2263 are included on ring 2250.Each node has a corresponding ID (shown in parenthesis) indicating itsposition on ring 2250. For example, node 2201 has ID=64 and node 2202has ID=30.

There are ranges of unoccupied identifiers between the depicted nodes.For example, unoccupied identifier range 2211 represents unoccupiedidentifiers 31 through 63 between nodes 2202 and 2201.

As depicted, node 2201 and 2202 have established and cached two-wayagreement 2223. For example, through prior communication, nodes 2201 and2202 can determine that there are no other nodes currently interspersedbetween ID=64 and ID=30. Thus, nodes 2201 and 2202 can further determinethat they are adjacent to one another on ring 2250. Accordingly, node2201 and 2202 can divide responsibility for unoccupied identifier range2211 (i.e., unoccupied identifiers 31 through 63) such that node 2202 isresponsible for a portion of unoccupied identifier range 2211 and node2201 is responsible for the remaining portion unoccupied identifierrange 2211. Each node is also responsible for its assigned ID. That is,node 2202 is responsible for ID=30 and node 2201 is responsible forID=64.

Accordingly, as depicted by responsibility boundary 2213 (betweenunoccupied identifier 47 and unoccupied identifier 48), node 2202(ID=30) is responsible for itself as well as unoccupied identifiers 31through 47 and node 2201 (ID=64) is responsible for itself as well asunoccupied identifiers 48 through 63. Although the midway point betweennodes 2201 and 2202 is at unoccupied identifier 47, node 2202 isassigned responsibility for unoccupied identifier 47 such that eachunoccupied identifier is the responsibility of a single node. Thus, aspreviously described, when a responsibility boundary falls on anunoccupied identifier, one of the adjacent nodes can be assign the soleresponsibility for the unoccupied identifier.

FIG. 24 illustrates an example flow chart of a method 2400 for routing amessage in accordance with a cached two-way agreement. Method 2400 willbe described with respect to the nodes and messages depicted in ringarchitecture 2200 of FIG. 22A.

Method 2400 includes an act of a receiving node receiving a messagealong with a destination identifier indicating a destination on the ringof nodes, the destination identifier located between the receiving nodeand one of the immediate neighbor nodes (act 2401). For example, node2201 can receive message 2251, indicated for delivery to ID=55.Alternately, node 2201 can receive message 2252, indicated for deliveryto ID=39. Message 2251 and 2252 can be received from another node inring 2250 (intra-ring communication), from a node in another ring ofring architecture 2200 (inter-ring communication), or through non-ringcommunication.

Method 2400 includes an act of the receiving node referring to a cachedtwo-way agreement between the receiving node and the immediate neighbornode to determine the next appropriate node that is to receive themessage (act 2402). The two-way agreement at least implies a division ofresponsibility for the identifier space between the receiving node andan immediate neighbor node. For example, node 2201 can refer to cachedtwo-way agreement 2223 to determine the next appropriate node that is toprocess message 2251. Since cached two-way agreement 2223 indicates thatnode 2201 (ID=64) is responsible for unoccupied identifier 55, node 2201determines that it is the appropriate node to process message 2251.Likewise, node 2201 can refer to cached two-way agreement 2223 todetermine the next appropriate node that is to process message 2252.Since cached two-way agreement 2223 indicates that node 2202 (ID=30) isresponsible for unoccupied identifier 39, node 2201 determines that node2202 is the next appropriate node that is to process message 2252.

Method 2400 includes an act of sending the message to the nextappropriate component based on the determination of the next appropriatenode (act 2403). For example, node 2201 can provide message 2251 to itsresource handler instance corresponding to unoccupied identifier 55,since cached two-way agreement 2223 indicates that node 2201 isresponsible for unoccupied identifier 55. Alternately, node 2201 canprovide message 2252 to node 2202, since cached two-way agreement 2223indicates that node 2202 is responsible for unoccupied identifier 39.Subsequently, node 2202 can provide message 2252 to its resource handlerinstance corresponding to unoccupied identifier 39.

When an identifier is not included in a cached two-way agreement, a nodecan refer to a routing table (e.g., as depicted in FIG. 3) to makeprogress towards a destination. For example, node 2201 can send message2253, indicated for delivery to ID=203, to node 2261 (ID=200). Node 2261can then refer to any cached two-way agreements with its adjacent nodesto determine the node that is responsible for identifier 203.

In some embodiments, multiple two-way agreements can, from theperspective of a given node, essentially represent a three-way agreementbetween the given node, the given node's immediate predecessor node, andthe given node's immediate successor node. FIG. 22B illustrates theexample ring architecture 2200 that facilitates routing a message inaccordance with multiple cached two-way agreements.

As previously described, nodes 2201 and 2202 can establish cachedtwo-way agreement 2223. Similarly, nodes 2201 and 2203 can establishcached-two way agreement 2224 to divide responsibility for unoccupiedidentifier range 2212 (i.e., unoccupied identifiers 65 through 101).Thus, through prior communication, nodes 2201 and 2203 can determinethat there are no other nodes currently interspersed between ID=65 andID=101. Thus, nodes 2201 and 2203 can further determine that they areadjacent to one another on ring 2250. Accordingly, nodes 2201 and 2203can divide unoccupied identifier range 2212 such that node 2202 isresponsible for a portion of unoccupied identifier range 2212 and node2201 is responsible for the remaining portion of unoccupied identifierrange 2212. Accordingly, as depicted within two-way agreement 2224, node2201 (ID=64) is responsible for itself as well as unoccupied identifiers65 through 82 and node 2202 (ID=101) is responsible for itself as wellas unoccupied identifiers range 83 through 100.

From the perspective of node 2201, the combination of cached two-wayagreement 2223 and cached two-way agreement 2224 essentially representsthree-way agreement 2273. That is, node 2201 is responsible for aportion of identifier space between node 2201 and node 2202 and isresponsible for a portion of identifier space between node 2201 and node2203. The parenthetical ranges of identifiers indicate the ranges ofresponsibility (i.e., 47 through 64 and 64 through 82) form thecached-two way agreements 2223 and 2224 on either side of node 2201.

FIG. 25 illustrates an example flow chart of a method 2500 for routing amessage in accordance with a multiple cached two-way agreements. Method2500 will be described with respect to the nodes and messages depictedin ring architecture 2200 of FIG. 22B.

Method 2500 includes an act of a receiving node receiving a messagealong with a destination identifier indicating a destination on the ringof nodes (act 2501). For example, node 2201 can receive any of messages2251, 2252, 2253, 2254, and 2256 indicated for delivery to ID=55, ID=39,ID=203, ID=74, and ID=94 respectively. Messages 2251, 2252, 2253, 2254,and 2256 can be received from another node in ring 2250 (intra-ringcommunication) or from a node in another ring of ring architecture 2200(inter-ring communication), or through non-ring communication.

Method 2500 includes an act of the receiving node referring to a firstcached two-way agreement with the predecessor node and a second cachedtwo-way agreement with the successor node to determine the nextappropriate node that is to receive the message (act 2502). The firstand second cached two-way agreements at least imply a division ofresponsibility for the identifier space between the predecessor node andthe successor node. For example, node 2201 can refer to cached three-wayagreements 2223 and 2224 to determine the next appropriate node that isto receive any of messages 2251, 2252, 2253, 2254, and 2256.

Since cached two-way agreement 2223 indicates that node 2202 (ID=30) isresponsible for unoccupied identifier 39, node 2201 determines that node2202 is the next appropriate node that is to process message 2252. Sincecached two-way agreement 2223 indicates that node 2201 (ID=64) isresponsible for unoccupied identifier 55, node 2201 determines that itis the appropriate node to process message 2252. Since cached two-wayagreement 2224 indicates that node 2201 (ID=64) is responsible forunoccupied identifier 74, node 2201 determines that it is theappropriate node to process message 2254. Since cached two-way agreement2224 indicates that node 2203 (ID=101) is responsible for unoccupiedidentifier 94, node 2201 determines that node 2203 is the nextappropriate node that is to process message 2254.

Method 2500 includes an act of sending the message to the nextappropriate component based on the determination of the next appropriatenode (act 2503). For example, node 2201 can send messages 2251, 2252,2253, 2254, and 2256 to the next appropriate component on ring 2250based on the determination of the next appropriate node that is toprocess messages 2251, 2252, 2253, 2254, and 2256.

For example, node 2201 can provide message 2252 to node 2202, sincecached two-way agreement 2223 indicates that node 2202 is responsiblefor unoccupied identifier 39. Subsequently, node 2202 can providemessage 2252 to its resource handler instance corresponding tounoccupied identifier 39. Node 2201 can provide message 2251 to itsresource handler instance corresponding to unoccupied identifier 55,since cached two-way agreement 2223 indicates that node 2201 isresponsible for unoccupied identifier 55. Node 2201 can provide message2254 to its resource handler instance corresponding to unoccupiedidentifier 74, since cached two-way agreement 2224 indicates that node2201 is responsible for unoccupied identifier 74. Node 2201 can providemessage 2256 to node 2203, since cached two-way agreement 2224 indicatesthat node 2203 is responsible for unoccupied identifier 94.Subsequently, node 2203 can provide message 2256 to its resource handlerinstance corresponding to unoccupied identifier 94.

When an identifier is not included in a cached either of multiple cachedtwo-way agreements, a node can refer to a routing table (e.g., asdepicted in FIG. 3) to make progress towards a destination. For example,node 2201 can send message 2256, indicated for delivery to ID=203, tonode 2261 (ID=200). Node 2261 can then refer to a any cached two-wayagreements with its predecessor node and/or its successor node todetermine the next appropriate component that is to receive message2253.

Formulating Cached Agreements

Rings can be reconfigured from time to time, such as, for example, whena new node joins a ring or when an existing node departs a ring (e.g.,through graceful removal, as a result of node monitoring, throughreference to an arbitrator, etc.). When a node detects that theconfiguration of a ring has changed, the node can reformulate cachedrouting agreements with any adjacent nodes. During agreementreformulation, the node can queue any received messages, expect thosefor formulating the agreement. After formulation of the agreement iscomplete, the node can then process the messages in accordance with theagreement.

Reconfiguration of a ring can cause multiple routing agreements to bereformulated. For example, when a node departs a ring, immediatelyadjacent nodes on either side of the departing node can formulate anagreement for the range of unoccupied identifiers that were previouslythe responsibility of the departing node (thus potentially gainingresponsibility for additional unoccupied identifiers). Thisreformulation joins responsibility for a portion of the range ofunoccupied identifiers from the departing node with the range ofunoccupied identifiers for each immediately adjacent node. That is, eachimmediately adjacent node gains responsibility for a portion of thedeparting node's range of unoccupied identifiers and the departingnode's identifier.

FIGS. 23A through 23D illustrate an example ring architecture 2300 thatfacilitates formulating a cached two-way agreement. As depicted in FIG.23A, nodes 2301 and 2302 have formulated cached two-way agreement 2323dividing responsibility for unoccupied identifier range 2312 (i.e.,unoccupied identifiers 31 through 63) at responsibility boundary 2313(between unoccupied identifier 47 and unoccupied identifier 48).Similarly, nodes 2302 and 2362 have formulated cached two-way agreement2343 dividing responsibility for unoccupied identifier range 2311 (i.e.,unoccupied identifiers 255 through 29) at responsibility boundary 2333(between unoccupied identifiers 14 and 15).

At some time subsequent to the formulation of cached two-way agreements2323 and 2343, node 2302 can leave ring 2350 (e.g., through gracefulremoval, as a result of node monitoring, based on instructions from anarbitrator, etc.). Referring now to FIG. 23B, subsequent to node 2302leaving ring 2350 there is no node responsible for the unoccupiedidentifiers that were previously the responsibility of node 2302.Unoccupied identifier range 2313 (unoccupied identifiers 15 through 47,including now unoccupied identifier 30) represents the range ofunoccupied identifiers that node 2302 was responsible for prior todeparting ring 2350.

In response to node 2302 leaving ring 2350, nodes 2301 and 2362 attemptto identify new immediate neighbor nodes. Node 2362 attempts to identifya new immediate successor node (i.e., an immediate neighbor node in thesame direction as node 2302 relative to node 2362). Node 2301 attemptsto identify a new immediate predecessor node (i.e., an immediateneighbor in the same direction as node 2302 relative to node 2301). InFIG. 23B, node 2362 identifies node 2301 as its new immediate successorand node 2301 identifies node 2362 as its new immediate predecessor.

Upon identifying new immediate neighbor nodes, nodes 2362 and 2301formulate cached two-way agreement 2363 to that divides responsibilityfor unoccupied identifier range 2314 (unoccupied identifiers 255 through63, including now unoccupied identifier 30). Unoccupied identified range2314 includes unoccupied identifier range 2313, which was previously theresponsibility of node 2302. Thus, portions of unoccupied identifierrange 2313 can become the responsibility of either node 2362 or node2301, after node 2302 departs ring 2350.

Accordingly, as depicted by responsibility boundary 2353 (betweenunoccupied identifier 31 and unoccupied identifier 32), node 2362(ID=254) and node 2301 (ID=30) formulate cached two-way agreement 2363.In accordance with cached two-way agreement 2363, node 2362 (ID=254) isresponsible for itself as well as unoccupied identifiers 255 through 31and node 2301 (ID=64) is responsible for itself as well as identifierrange 32 through 63. Although the midway point between nodes 2201 and2202 is at unoccupied identifier 31, node 2362 is assignedresponsibility for unoccupied identifier 31 such that each unoccupiedidentifier is the responsibility of a single node.

During time between the departure of node 2302 and formulation of cachedtwo-way agreement 2363, nodes 2301 and 2362 do not process messagesindicated for delivery to identifiers in the range between 255 and 63.Instead, nodes 2301 and 2362 queue any messages, expect those forformulating cached two-way agreement 2363. After formulation of thecached two-way agreement 2363 is complete, nodes 2301 and 2362 can thenprocess the messages in accordance with cached two-way agreement 2363.

When a new node joins a ring between two existing nodes, each existingnode can formulate a routing agreement with the new node (and thuspotentially giving up responsibility for a portion of unoccupiedidentifiers). This formulation can essentially split a range ofunoccupied identifiers an existing node is responsible for between thejoining node and the existing node. That is, each existing nodepotentially gives up responsibility for a portion of the existing node'sunoccupied identifiers to the joining node.

Referring now to FIG. 23C, at some time subsequent to the formulation ofcached two-way agreement 2363, node 2304 (ID=44) can join ring 2350.Subsequent to node 2304 joining ring 2350, node 2362 can detect node2304 as its immediate successor. Likewise, node 2301 can detect node2304 as its immediate predecessor. In response to each of thedetections, unoccupied identifier range 2314 is essentially split intounoccupied identifier range 2315 (unoccupied identifiers 255 through 43)and unoccupied identifier range 2316 (unoccupied identifiers 45 through63). New cached-two way agreements can then be formulated to divideresponsibility for unoccupied identifier ranges 2315 and 2316.

Referring now to FIG. 23D, upon identifying node 2304 as a new immediatesuccessor node, nodes 2362 and 2304 formulate cached two-way agreement2394 to that divides responsibility for unoccupied identifier range 2315(unoccupied identifiers 255 through 43). Unoccupied identified range2315 includes portions of unoccupied identifier range 2314, which werepreviously the responsibility of node 2362 and in this case some ofwhich were previously the responsibility of node 2301. Thus, portions ofunoccupied identifier range 2314 that were the responsibility of eithernode 2362 or node 2301, can become the responsibility of node 2304 whennode 2304 joins ring 2350.

Accordingly, as depicted by responsibility boundary 2393 (betweenunoccupied identifier 23 and unoccupied identifier 24), node 2362(ID=254) and node 2304 (ID=44) formulate cached two-way agreement 2394.In accordance with cached two-way agreement 2394, node 2362 (ID=254) isresponsible for itself as well as unoccupied identifiers 255 through 23and node 2304 (ID=44) is responsible for itself as well as identifierrange 24 through 43. Although the midway point between nodes 2201 and2202 is at unoccupied identifier 23, node 2362 is assignedresponsibility for unoccupied identifier 23 such that each unoccupiedidentifier is the responsibility of a single node.

Similarly, upon identifying node 2304 as a new immediate predecessornode, nodes 2301 and 2304 formulate cached two-way agreement 2383 thatdivides responsibility for unoccupied identifier range 2316 (unoccupiedidentifiers 45 through 64). Unoccupied identified range 2316 includesportions of unoccupied identifier range 2314, which were previously theresponsibility of node 2301. Thus, portions of unoccupied identifierrange 2314, which were the responsibility of node 2301, can become theresponsibility of node 2304 when node 2304 joins ring 2350.

Accordingly, as depicted by responsibility boundary 2373 (betweenunoccupied identifier 54 and unoccupied identifier 55), node 2304(ID=44) and node 2301 (ID=64) formulate cached two-way agreement 2383.In accordance with cached two-way agreement 2383, node 2304 (ID=44) isresponsible for itself as well as unoccupied identifiers 45 through 54and node 2301 (ID=64) is responsible for itself as well as identifierrange 55 through 63. Although the midway point between nodes 2201 and2202 is at unoccupied identifier 54, node 2304 is assignedresponsibility for unoccupied identifier 54 such that each unoccupiedidentifier is the responsibility of a single node.

During time between the joining of node 2304 and formulation of cachedtwo-way agreement 2394, nodes 2362 and 2304 do not process messagesindicated for delivery to identifiers in the range between 255 and 43.Instead, nodes 2362 and 2304 queue any messages, expect those forformulating cached two-way agreement 2394. After formulation of thecached two-way agreement 2394 is complete, nodes 2362 and 2304 can thenprocess the messages in accordance with cached two-way agreement 2394.

Similarly, during time between the joining of node 2304 and formulationof cached two-way agreement 2383, nodes 2304 and 2301 do not processmessages indicated for delivery to identifiers in the range between 45and 63. Instead, nodes 2304 and 2301 queue any messages, expect thosefor formulating cached two-way agreement 2383. After formulation of thecached two-way agreement 2383 is complete, nodes 2304 and 2301 can thenprocess the messages in accordance with cached two-way agreement 2383.

From the perspective of node 2304, the combination of cached two-wayagreement 2394 and cached two-way agreement 2383 can essentiallyrepresent a corresponding three-way agreement (not shown) between node2304, node 2362, and 2301. From the perspective of node 2304, thecorresponding represented three-way agreement defines responsibility for(assigned and unoccupied) identifiers from and including ID=254 to andincluding ID=64.

FIG. 26 illustrates an example flow chart of a method 2600 for joining atwo-way agreement. Method 2600 will be discussed with respect to thenodes and agreements in FIGS. 23A through 23D.

Method 2600 includes an act of a current node accessing an indicationthat the configuration of the ring of nodes has changed, the indicationindicative of a need to formulate a two-way agreement dividingresponsibility for at least unoccupied identifiers on the ring betweenthe current node and the immediate neighbor node (act 2601). Forexample, referring to FIGS. 23A and 23B, node 2301 and/or node 2362 canaccess an indication, for example, from node 2302, through monitoring ofnode 2302, or from an arbitrator, that node 2302 departed ring 2350. Theindication of node 2302 departing ring 2350 indicates to node 2301and/or node 2362 a need to formulate a two-way agreement dividingresponsibility for unoccupied identifier range 2314 (unoccupiedidentifiers 255 through 63).

Alternately, referring to FIGS. 23C and 23D, node 2301 can access anindication (e.g., sent as part of the join process of node 2304) thatnode 2304 has joined ring 2350. The indication of node 2304 joining ring2350 indicates to node 2301 a need to formulate a two-way agreementdividing responsibility for unoccupied identifier range 2316 (unoccupiedidentifiers 45 through 63). Similarly, node 2362 can access anindication (e.g., sent as part of the join process of node 2304) thatnode 2304 has joined ring 2350. The indication of node 2304 joining ring2350 indicates to node 2362 a need to formulate a two-way agreementdividing responsibility for unoccupied identifier range 2315 (unoccupiedidentifiers 255 through 43).

Method 2600 includes an act of the current node and the immediateneighbor node agreeing to a responsibility boundary between the currentnode and the immediate neighbor node that is to divide responsibilityfor the unoccupied identifiers between the current node and theimmediate neighbor node (act 2602). Unoccupied identifiers between thecurrent node and the responsibility boundary are the responsibility ofthe current node and unoccupied identifiers between the responsibilityboundary and the immediate neighbor node are the responsibility of theimmediate neighbor node.

For example, referring to FIG. 23B node 2301 and node 2362 can agree toresponsibility boundary 2353, which is essentially between unoccupiedidentifiers 31 and 32. Thus, unoccupied identifiers between node 2301and responsibility boundary 2353 (i.e., unoccupied identifiers 32through 63) are the responsibility of node 2301. Likewise, unoccupiedidentifiers between responsibility boundary 2353 and node 2362 (i.e.,unoccupied identifiers 255 through 31) are the responsibility of node2362.

Referring to FIG. 23D, node 2301 and node 2304 can agree toresponsibility boundary 2373, which is essentially between unoccupiedidentifiers 54 and 55. Thus, unoccupied identifiers between node 2301and responsibility boundary 2373 (i.e., identifiers 55 through 63) arethe responsibility of node 2301. Likewise, unoccupied identifiersbetween responsibility boundary 2373 and node 2304 (i.e., unoccupiedidentifiers 45 through 54) are the responsibility of node 2304.

Still referring to FIG. 23D, node 2304 and node 2362 can agree toresponsibility boundary 2393, which is essentially between unoccupiedidentifiers 23 and 24. Thus, identifiers between node 2304 andresponsibility boundary 2393 (i.e., unoccupied identifiers 24 through43) are the responsibility of node 2304. Likewise, unoccupiedidentifiers between responsibility boundary 2393 and node 2362 (i.e.,unoccupied identifiers 255 through 23) are the responsibility of node2362.

FIG. 6 and the following discussion are intended to provide a brief,general description of a suitable computing environment in which theinvention may be implemented. Although not required, the invention willbe described in the general context of computer-executable instructions,such as program modules, being executed by computer systems. Generally,program modules include routines, programs, objects, components, datastructures, and the like, which perform particular tasks or implementparticular abstract data types. Computer-executable instructions,associated data structures, and program modules represent examples ofthe program code means for executing acts of the methods disclosedherein.

With reference to FIG. 6, an example system for implementing theinvention includes a general-purpose computing device in the form ofcomputer system 620, including a processing unit 621, a system memory622, and a system bus 623 that couples various system componentsincluding the system memory 622 to the processing unit 621. Processingunit 621 can execute computer-executable instructions designed toimplement features of computer system 620, including features of thepresent invention. The system bus 623 may be any of several types of busstructures including a memory bus or memory controller, a peripheralbus, and a local bus using any of a variety of bus architectures. Thesystem memory includes read only memory (“ROM”) 624 and random accessmemory (“RAM”) 625. A basic input/output system (“BIOS”) 626, containingthe basic routines that help transfer information between elementswithin computer system 620, such as during start-up, may be stored inROM 624.

The computer system 620 may also include magnetic hard disk drive 627for reading from and writing to magnetic hard disk 639, magnetic diskdrive 628 for reading from or writing to removable magnetic disk 629,and optical disk drive 630 for reading from or writing to removableoptical disk 631, such as, or example, a CD-ROM or other optical media.The magnetic hard disk drive 627, magnetic disk drive 628, and opticaldisk drive 630 are connected to the system bus 623 by hard disk driveinterface 632, magnetic disk drive-interface 633, and optical driveinterface 634, respectively. The drives and their associatedcomputer-readable media provide nonvolatile storage ofcomputer-executable instructions, data structures, program modules, andother data for the computer system 620. Although the example environmentdescribed herein employs magnetic hard disk 639, removable magnetic disk629 and removable optical disk 631, other types of computer readablemedia for storing data can be used, including magnetic cassettes, flashmemory cards, digital versatile disks, Bernoulli cartridges, RAMs, ROMs,and the like.

Program code means comprising one or more program modules may be storedon hard disk 639, magnetic disk 629, optical disk 631, ROM 624 or RAM625, including an operating system 635, one or more application programs636, other program modules 637, and program data 638. A user may entercommands and information into computer system 620 through keyboard 640,pointing device 642, or other input devices (not shown), such as, forexample, a microphone, joy stick, game pad, scanner, or the like. Theseand other input devices can be connected to the processing unit 621through input/output interface 646 coupled to system bus 623.Input/output interface 646 logically represents any of a wide variety ofdifferent interfaces, such as, for example, a serial port interface, aPS/2 interface, a parallel port interface, a Universal Serial Bus(“USB”) interface, or an Institute of Electrical and ElectronicsEngineers (“IEEE”) 1394 interface (i.e., a FireWire interface), or mayeven logically represent a combination of different interfaces.

A monitor 647 or other display device is also connected to system bus623 via video interface 648. Speakers 669 or other audio output deviceis also connected to system bus 623 via audio interface 649. Otherperipheral output devices (not shown), such as, for example, printers,can also be connected to computer system 620.

Computer system 620 is connectable to networks, such as, for example, anoffice-wide or enterprise-wide computer network, a home network, anintranet, and/or the Internet. Computer system 620 can exchange datawith external sources, such as, for example, remote computer systems,remote applications, and/or remote databases over such networks.

Computer system 620 includes network interface 653, through whichcomputer system 620 receives data from external sources and/or transmitsdata to external sources. As depicted in FIG. 6, network interface 653facilitates the exchange of data with remote computer system 683 vialink 651. Network interface 653 can logically represent one or moresoftware and/or hardware modules, such as, for example, a networkinterface card and corresponding Network Driver Interface Specification(“NDIS”) stack. Link 651 represents a portion of a network (e.g., anEthernet segment), and remote computer system 683 represents a node ofthe network.

Likewise, computer system 620 includes input/output interface 646,through which computer system 620 receives data from external sourcesand/or transmits data to external sources. Input/output interface 646 iscoupled to modem 654 (e.g., a standard modem, a cable modem, or digitalsubscriber line (“DSL”) modem) via link 659, through which computersystem 620 receives data from and/or transmits data to external sources.As depicted in FIG. 6, input/output interface 646 and modem 654facilitate the exchange of data with remote computer system 693 via link652. Link 652 represents a portion of a network and remote computersystem 693 represents a node of the network.

While FIG. 6 represents a suitable operating environment for the presentinvention, the principles of the present invention may be employed inany system that is capable of, with suitable modification if necessary,implementing the principles of the present invention. The environmentillustrated in FIG. 6 is illustrative only and by no means representseven a small portion of the wide variety of environments in which theprinciples of the present invention may be implemented.

In accordance with the present invention, nodes, application layers, andother lower layers, as well as associated data, including routing tablesand node IDs may be stored and accessed from any of thecomputer-readable media associated with computer system 620. Forexample, portions of such modules and portions of associated programdata may be included in operating system 635, application programs 636,program modules 637 and/or program data 638, for storage in systemmemory 622.

When a mass storage device, such as, for example, magnetic hard disk639, is coupled to computer system 620, such modules and associatedprogram data may also be stored in the mass storage device. In anetworked environment, program modules depicted relative to computersystem 620, or portions thereof, can be stored in remote memory storagedevices, such as, system memory and/or mass storage devices associatedwith remote computer system 683 and/or remote computer system 693.Execution of such modules may be performed in a distributed environmentas previously described.

FIG. 27 illustrates a ring architecture 2700 in which the principles ofthe present invention may be employed. Ring architecture 2700 includesring of nodes 2705. In some embodiments, ring of nodes 2705 may besimilar to or the same as ring 2350 in FIG. 23C, as described above.Ring of nodes 2705 may include joining node 2710 which may be attemptingto join the ring between immediately adjacent node 1 (2720) andimmediately adjacent node 2 (2730). In some embodiments, joining node2710 may join ring of nodes 2710 in a manner similar to that describedin FIG. 23C, where the joining node determines an identifier range basedon a cached agreement between nodes 2301 and 1362. A method formaintaining ring consistency during the joining of a node is describedin further detail below with reference to the nodes and data items ofFIG. 27.

FIG. 28 illustrates a flowchart of a method 2800 for maintaining ringconsistency when a joining node joins the ring of nodes. In someembodiments, method 2800 incorporates multiple sub methods, eachstemming from a different node's point of view. The method 2800 will nowbe described with frequent reference to the components and data ofenvironment 2700 as well as state diagram 3000 of FIG. 30.

Method 2800 includes an act of a joining node establishing aneighborhood of a plurality of other nodes on the ring, the neighborhoodincluding at least an immediately adjacent predecessor node and animmediately adjacent successor node (act 2805). For example, joiningnode 2710 may establish a neighborhood of a plurality of other nodes onring 2705, where the neighborhood includes immediately adjacent node2720 and other immediately adjacent node 2730. In some embodiments, suchas in state diagram 3000, joining node 3005 may establish a neighborhoodby sending introduction messages (e.g. Intro 3006) in step 1. Each nodethat receives such an introduction message may respond with anacknowledgement (ACK) message (e.g. ACK 3007) in step 2 of the statediagram. The intro 3006 may include one or more portions of informationused to identify the joining node and indicate that joining node 3005intends to join ring 2705.

From the ACK messages received back by joining node 3005, the joiningnode may be configured to determine which node is the closest to it onthe ring. For example, each ACK message may include identifier rangesand/or position identifiers indicating the nodes position on the ringand the ranges for which the node has responsibility. Thus, in statediagram 3000, joining node 3005 may determine that immediately adjacentnode 3 (3010) is the joining node's immediately adjacent predecessornode and that immediately adjacent node 5 (3015) is the joining node'simmediately adjacent successor node. Furthermore, joining node 3005 maydetermine that adjacent node 1 (3020) and adjacent node 2 (3025) are onthe same ring as the joining node, but are not necessarily the joiningnode's immediately closest nodes. Thus, neighborhood establishment 3050may be accomplished according to exemplary state diagram 3000.

Method 2800 includes an act of the joining node indicating to one of theimmediately adjacent nodes selected from among the immediately adjacentpredecessor node and an immediately adjacent successor node, the intentof the joining node to take id-space ownership for a portion of theid-space between the joining node and the selected immediately adjacentnode (act 2810). For example, joining node 2710 may indicate toimmediately adjacent node 1 (2720) selected from among immediatelyadjacent node 1 (2720) and immediately adjacent node 2 (2730), theintent of joining node 2710 to take id-space ownership for a portion ofthe id-space between joining node 2710 and selected immediately adjacentnode 2720. As explained above, id-space may include an identifier range(unoccupied or otherwise) for which a given node is responsible. Forexample, id-space may include a numerical range of node identifiers forwhich a given node is responsible.

In some embodiments, such as in state diagram 3000, the act of thejoining node 3005 indicating to one of the immediately adjacent nodesselected from among the immediately adjacent predecessor node and animmediately adjacent successor node, the intent of the joining node totake id-space ownership for a portion of the id-space between thejoining node 3005 and the selected immediately adjacent node 3010comprises an act of sending a token request 3031 to a selectedimmediately adjacent node 3010 from among the immediately adjacentpredecessor node 3010 and an immediately adjacent successor node 3015,the token request including a node identifier such that only the nodewith the node identifier is capable of replying and a first time-to-liveduration value 3031, the first time-to-live duration value indicative ofa duration for which the joining node 3005 can assume a monitoringrelationship with the selected immediately adjacent node is active.

In some cases, the token request message 3031 includes a markerindicating an updated status of the joining node's 3005 expectedownership range. Time-to-live values (TTL's) and relationship monitoringmay be substantially the same as described in method 2000 of FIG. 20.

Method 2800 includes an act of the joining node initiating a one-waymonitoring relationship with the selected immediately adjacent node (act2815). For example, joining node 2710 may initiate a one-way monitoringrelationship with immediately adjacent node 1 (2720) as indicated inmonitoring relationship indication 2712. In such a monitoringrelationship, joining node 2710 may agree to monitor a certain range ofnode identifiers. In some cases, a range may include identifiers betweenthose of immediately adjacent node 2720 and immediately adjacent node2730.

Method 2800 includes an act of a first selected immediately adjacentnode receiving an indication from the joining node indicating the intentof the joining node to take id-space ownership for a portion of theid-space between the joining node and the first selected immediatelyadjacent node (act 2820). For example, immediately adjacent node 1(2720) may receive an indication (e.g. id-space ownership indication2711) from joining node 2710 indicating the intent of joining node 2710to take id-space ownership for a portion of the id-space between thejoining node and node 2720.

Method 2800 includes an act of the first selected immediately adjacentnode receiving an indication from the joining node of the joining node'sintent to initiate a one-way monitoring relationship with the selectedimmediately adjacent node (act 2825). For example, immediately adjacentnode 1 (2720) may receive an indication (e.g. monitoring relationshipindication 2712) from joining node 2710 of the joining node's intent toinitiate a one-way monitoring relationship with immediately adjacentnode 2720.

Method 2800 includes an act of the first selected immediately adjacentnode sending an indication to the joining node that indicates acceptanceof the joining node's intent to take id-space ownership for a portion ofthe id-space between the joining node and the first selected immediatelyadjacent node and indicates establishment of a one-way monitoringrelationship between the first selected immediately adjacent node andthe joining node (act 2830). For example, immediately adjacent node 1(2720) may send indication 2713 to joining node 2710 indicatingacceptance of the joining node's intent to take id-space ownership for aportion of the id-space between joining node 2710 and immediatelyadjacent node 2720 (e.g. id-space ownership acceptance 2713A) andindicates establishment of a one-way monitoring relationship betweenimmediately adjacent node 2720 and joining node 2710 (e.g. monitoringrelationship establishment 2713B).

Method 2800 includes an act of the joining node receiving an indicationfrom the selected immediately adjacent node that indicates acceptance ofthe joining node's intent to take id-space ownership for a portion ofthe id-space between the joining node and the selected immediatelyadjacent node and indicates establishment of a one-way monitoringrelationship between the selected immediately adjacent node and thejoining node (act 2835). For example, joining node 2710 may receiveindication 2713 from immediately adjacent node 1 (2720) that indicatesacceptance of the joining node's intent to take id-space ownership for aportion of the id-space between joining node 2710 and immediatelyadjacent node 2720 and indicates establishment of a one-way monitoringrelationship between immediately adjacent node 2720 and joining node2710.

In some embodiments, such as in state diagram 3000, the act of thejoining node 3005 receiving an indication from the selected immediatelyadjacent node 3010 that indicates acceptance of the joining node'sintent to take id-space ownership for a portion of the id-space betweenthe joining node and the selected immediately adjacent node andindicates establishment of a one-way monitoring relationship between theselected immediately adjacent node 3010 and the joining node 3005comprises an act of receiving a first token transfer 3032 from theselected immediately adjacent node 3010, the first token transferincluding the joining node's ownership range of unoccupied nodeidentifiers in the ring of nodes between the joining node 3005 and theselected immediately adjacent node 3010, a second time-to-live durationvalue 3032, the second time-to-live duration value indicative of aduration for which the selected immediately adjacent node can assume amonitoring relationship with the joining node 3005 is active and a firstestablish grant indicative of the selected immediately adjacent node3010 monitoring the joining node.

Method 2800 includes an act of the joining node agreeing to participatein a one-way monitoring relationship with the selected immediatelyadjacent node (act 2840). For example, joining node 2710 may agree toparticipate (e.g. monitoring relationship agreement 2714) in a one-waymonitoring relationship with immediately adjacent node 1 (2720). In someembodiments, such as in state diagram 3000, the act of the joining node3005 agreeing to participate in a one-way monitoring relationship withthe selected immediately adjacent node 3010 comprises an act of sendingan acknowledgement message 3033 to the selected immediately adjacentnode 3010, the acknowledgement message 3033 including a first ownershiprange between the joining node 3005 and a second establish grantindicative of the joining node monitoring the selected immediatelyadjacent node.

Referring again to FIGS. 27 and 28, method 2800 includes an act of thefirst selected immediately adjacent node receiving an agreement fromjoining node agreeing to participate in a one-way monitoringrelationship with the selected immediately adjacent node (act 2845). Forexample, immediately adjacent node 2720 may receive an agreement (e.g.monitoring relationship agreement 2714) from joining node 2710 agreeingto participate in a one-way monitoring relationship with immediatelyadjacent node 2720.

In some embodiments, selected immediately adjacent node 2720 may,additionally or alternatively, perform the acts of indicating to asecond selected immediately adjacent node the first node's intent toterminate any monitoring relationships with the second selectedimmediately adjacent node, receiving an indication from the secondselected immediately adjacent node indicating the second node's intentto terminate any monitoring relationships with the first selectedimmediately adjacent node and acknowledging the second node's intent toterminate. For example, immediately adjacent node 1 (2720) may indicateto immediately adjacent node 2 (2730) node 1′s intent to terminate anymonitoring relationships with node 2 (2730). Immediately adjacent node 1(2720) may also receive an indication from node 2 (2730) indicating node2′s intent to terminate any monitoring relationships with node 1.Immediately adjacent node 1 (2720) may also acknowledge node 2′s intentto terminate.

In some cases, such as in state diagram 3000, immediately adjacent node3 (3010) may be configured to indicate to immediately adjacent node 5(3015) node 3′s intent to terminate any monitoring relationships withnode 5 (3015) in step 5 (3034) of the state diagram. Immediatelyadjacent node 3 (3010) may also receive an indication from node 5 (3015)indicating node 5′s intent to terminate any monitoring relationshipswith node 3 in step 6 (3035) of the state diagram. Immediately adjacentnode 3 (3010) may also acknowledge node 5′s intent to terminate in step7 (3036) of the state diagram. It should be noted that the steps (1-8)of state diagram 3000 may occur in series or in parallel. Thus, in someembodiments, all steps labeled (5), for example, may occursimultaneously and others may occur in series. Any combination of stepsperformed in series or parallel is possible.

Method 2800 includes an act of a first selected immediately adjacentnode, selected from among the immediately adjacent predecessor node andan immediately adjacent successor node, indicating to the joining nodeid-space ownership for the portion of id-space between the joining nodeand the first selected immediately adjacent node and establishment of aone-way monitoring relationship between the first selected immediatelyadjacent node and the joining node (act 2850). For example, immediatelyadjacent node 2 (2730), selected from among immediately adjacent node 1(2720) and an immediately adjacent node 2 (2730), may indicate (e.g. inindication 2723) to joining node 2710 id-space ownership for the portionof id-space between joining node 2710 and immediately adjacent node 2730(e.g. in id-space ownership 2723A) and establishment of a one-waymonitoring relationship between immediately adjacent node 2730 andjoining node 2710 (e.g. in monitoring relationship establishment 2723B).

Method 2800 includes an act of the joining node receiving an indicationfrom the other immediately adjacent node that indicates id-spaceownership for the portion of id-space between the joining node and theother immediately adjacent node and indicates establishment of a one-waymonitoring relationship between the other immediately adjacent node andthe joining node (act 2855). For example, joining node 2710 may receiveindication 2723 from immediately adjacent node 2 (2730) that indicatesid-space ownership for the portion of id-space between joining node 2710and immediately adjacent node 2730 and indicates establishment of aone-way monitoring relationship between immediately adjacent node 2730and joining node 2710.

In some embodiments, such as in state diagram 3000, the act of thejoining node 3005 receiving an indication from the other immediatelyadjacent node 3015 that indicates id-space ownership for the portion ofid-space between the joining node 3005 and the other immediatelyadjacent node 3015 indicates establishment of a one-way monitoringrelationship between the other immediately adjacent node and the joiningnode comprises an act of receiving a second token transfer 3037 from theother immediately adjacent node 3015 in step 6, the second tokentransfer including the joining node's ownership range of unoccupied nodeidentifiers between the joining node 3005 and the other immediatelyadjacent node 3015 and a third time-to-live duration value 3037, thethird time-to-live duration value indicative of a duration for which theother immediately adjacent node 3015 can assume a monitoringrelationship with the joining node 3005 is active.

Referring again to FIGS. 27 and 28, method 2800 includes an act of thejoining node indicating to the other immediately adjacent node theintent of the joining node to establish id-space ownership for a portionof the id-space between the joining node and the other immediatelyadjacent node (act 2860). For example, joining node 2710 may indicate(e.g. in id-space ownership indication 2721) to immediately adjacentnode 2 (2730) the intent of joining node 2710 to establish id-spaceownership for a portion of the id-space between joining node 2710 andimmediately adjacent node 2730.

In some embodiments, such as in state diagram 3000, the act of thejoining node 3005 indicating to the other immediately adjacent node 3015the intent of the joining node to establish id-space ownership for aportion of the id-space between the joining node 3005 and the otherimmediately adjacent node 3015 comprises an act of sending anestablishment request (3038 in step 7) to establish a second ownershiprange between the joining node 3005 and the other immediately adjacentnode 3015, the establishment request 3038 including a second ownershiprange between the joining node 3005 and the other immediately adjacentnode 3015, a fourth time-to-live duration 3038, the fourth time-to-liveduration indicative of a duration for which the joining node 3005 canassume a monitoring relationship with the other immediately adjacentnode 3015 is active, and a third establish grant indicative of thejoining node monitoring the other immediately adjacent node 3015.

Method 2800 includes an act of the joining node initiating a one-waymonitoring relationship with the other immediately adjacent node (act2865). For example, joining node 2710 may initiate (e.g. via monitoringrelationship indication 2722) a one-way monitoring relationship withimmediately adjacent node 2 (2730. In some embodiments, such as in statediagram 3000, the act of the joining node 3005 indicating to the otherimmediately adjacent node 3015 the intent of the joining node toestablish id-space ownership for a portion of the id-space between thejoining node 3005 and the other immediately adjacent node 3015 comprisesan act of sending an establishment request 3038 to establish a secondownership range between the joining node 3005 and the other immediatelyadjacent node 3015, the establishment request 3038 including a secondownership range between the joining node and the other immediatelyadjacent node, a fourth time-to-live duration 3038, the fourthtime-to-live duration indicative of a duration for which the joiningnode can assume a monitoring relationship with the other immediatelyadjacent node is active, and a third establish grant 3038 indicative ofthe joining node monitoring the other immediately adjacent node.

Method 2800 includes an act of the first selected immediately adjacentnode receiving an indication of the joining node's intent to establishid-space ownership for a portion of the id-space between the joiningnode and the first selected immediately adjacent node (act 2870). Forexample, immediately adjacent node 2 (2730) may receive id-spaceownership indication 2721 indicating the joining node's intent toestablish id-space ownership for a portion of the id-space betweenjoining node 2710 and immediately adjacent node 2730.

Method 2800 includes an act of the first selected immediately adjacentnode receiving an indication of the joining node's intent to initiate aone-way monitoring relationship with the first selected immediatelyadjacent node (act 2875). For example, immediately adjacent node 2(2730) may receive monitoring relationship indication 2722 indicatingjoining node's intent to initiate a one-way monitoring relationship withimmediately adjacent node 2730.

Method 2800 includes an act of the first selected immediately adjacentnode indicating to the joining node the first selected node's intent toestablish a one-way monitoring relationship between the first selectednode and the joining node (act 2880). For example, immediately adjacentnode 2 (2730) may indicate to joining node 2710 (e.g. via monitoringrelationship agreement 2724) the immediately adjacent node's intent toestablish a one-way monitoring relationship between immediately adjacentnode 2730 and joining node 2710.

In some cases, immediately adjacent node 2 (2730) may, additionally oralternatively, perform the acts of receiving an indication from a secondselected immediately adjacent node indicating the second node's intentto terminate any monitoring relationships with the first selectedimmediately adjacent node, indicating to the second selected immediatelyadjacent node the first node's intent to terminate any monitoringrelationships with the second selected immediately adjacent node andreceiving an acknowledgment acknowledging the first node's intent toterminate. The other immediately adjacent node may also acknowledge theindication from the second selected immediately adjacent node. Forexample, immediately adjacent node 2 (2730) may receive an indicationfrom immediately adjacent node 1 (2720) indicating node 1′s intent toterminate any monitoring relationships with node 2. Node 2 (2730) mayalso receive an acknowledgement (3036 in state diagram 3000)acknowledging node 2′s intent to terminate. Node 2 (2730) may alsoacknowledge the indication from node 1 (2720).

Method 2800 includes an act of the joining node receiving an indicationfrom the other immediately adjacent node indicating establishment of aone-way monitoring relationship between the other immediately adjacentnode and the joining node (act 2885). For example, joining node 2710 mayreceive an indication from immediately adjacent node 2 (2730) (e.g.monitoring relationship agreement 2724) indicating establishment of aone-way monitoring relationship between immediately adjacent node 2730and joining node 2710. In some embodiments, such as in state diagram3000, the act of the joining node receiving an indication from the otherimmediately adjacent node indicating establishment of a one-waymonitoring relationship between the other immediately adjacent node andthe joining node comprises an act of receiving a fourth establish grant(e.g. 3039 in step 8) for the establishment request, the fourthestablish grant indicative of the other adjacent node 3015 monitoringthe joining node 3005.

Furthermore, joining node 2710 may receive a negative acknowledge (NAK)message from at least one of the nodes on the ring (e.g. immediatelyadjacent node 1 (2720), where the NAK message includes an indication ofthe NAK sender's view of the neighborhood. Using the NAK sender's viewof the neighborhood, joining node 2710 may update its view of theneighborhood based on the NAK sender's view of the neighborhood.

FIG. 29 illustrates a flowchart of a method 2900 for maintaining ringconsistency when the leaving node leaves the ring of nodes. The method2900 will now be described with frequent reference to the components anddata of environment 2700 and state diagram 3100 of FIG. 31.

Method 2900 includes an act of the first selected immediately adjacentnode receiving an indication from the leaving node indicating theleaving node's intent to leave the ring of nodes (act 2910). Forexample, immediately adjacent node 3 (3110) may receive an indicationfrom leaving node 4 (3105) indicating the leaving node's intent to leavethe ring of nodes. In some embodiments, the act of the first selectedimmediately adjacent node 3110 receiving an indication from the leavingnode 3105 indicating the leaving node's intent to leave the ring ofnodes comprises the first selected immediately adjacent node 3110receiving a departure message 3121 (step 1 in state diagram 3100) fromleaving node 3105, where the departure message includes an ownershiprange of node identifiers indicated as being owned by leaving node 3105.

Method 2900 includes an act of the first selected immediately adjacentnode sending an indication to the second selected immediately adjacentnode that indicates acceptance of the leaving node's intent to leaveid-space ownership for a portion of the id-space between the leavingnode and the first selected immediately adjacent node and indicatesestablishment of a one-way monitoring relationship between the firstselected immediately adjacent node and the second selected immediatelyadjacent node (act 2920). For example, immediately adjacent node 3(3110) may send an indication (e.g. Establish & TTL 3122) to immediatelyadjacent node 5 (3115) that indicates acceptance of the leaving node'sintent to leave id-space ownership for a portion of the id-space betweenleaving node 3105 and immediately adjacent node 3110 and indicatesestablishment of a one-way monitoring relationship between immediatelyadjacent node 3 (3110) and immediately adjacent node 5 (3115).

In some embodiments, such as in state diagram 3100, the act of the firstselected immediately adjacent node sending an indication to the secondselected immediately adjacent node that indicates acceptance of theleaving node's intent to leave id-space ownership for a portion of theid-space between the leaving node and the first selected immediatelyadjacent node indicates establishment of a one-way monitoringrelationship between the first selected immediately adjacent node andthe second selected immediately adjacent node comprises an act of thefirst selected immediately adjacent node 3110 sending a firstestablishment request 3122 (e.g. in step 2 of state diagram 3100) to thesecond selected immediately adjacent node 3115 to establish an ownershiprange between the first selected immediately adjacent node 3110 and thesecond selected immediately adjacent node 3115, the first establishmentrequest including a first time-to-live duration 3122, the firsttime-to-live duration indicative of a duration for which the firstselected immediately adjacent node 3110 can assume a monitoringrelationship with the second selected immediately adjacent node 3115 isactive, and an act of the first adjacent node 3110 receiving a firstestablish grant 3123 (e.g. in step 3 of state diagram 3100) for thefirst establishment request 3122, the first establish grant 3123indicative of the second adjacent node 3115 monitoring the firstselected immediately adjacent node 3110.

Method 2900 includes an act of the first selected immediately adjacentnode receiving an indication from the second selected immediatelyadjacent node that indicates acceptance of the first node's intent toassume id-space ownership for a portion of the id-space between theleaving node and the first selected immediately adjacent node andindicates establishment of a one-way monitoring relationship between thesecond selected immediately adjacent node and the first selectedimmediately adjacent node (act 2930). For example, immediately adjacentnode 3 (3110) may receive an indication (e.g. Establish & TTL 3124) fromimmediately adjacent node 5 (3115) that indicates acceptance ofimmediately adjacent node 5′s intent to assume id-space ownership for aportion of the id-space between leaving node 3105 and immediatelyadjacent node 3110, and indicates establishment of a one-way monitoringrelationship between immediately adjacent node 3115 and immediatelyadjacent node 3110.

In some embodiments, such as in state diagram 3100, wherein the act ofan act of the first selected immediately adjacent node receiving anindication from the second selected immediately adjacent node thatindicates acceptance of the first node's intent to assume id-spaceownership for a portion of the id-space between the leaving node and thefirst selected immediately adjacent node and indicates establishment ofa one-way monitoring relationship between the second selectedimmediately adjacent node and the first selected immediately adjacentnode comprises an act of the first adjacent node 3110 receiving a secondestablishment request (e.g. in step 2 of state diagram 3100) from thesecond adjacent node 3115 to establish an ownership range between thefirst adjacent node 3110 and the second adjacent node 3115, the secondestablishment request including a second time-to-live duration 3124, thesecond time-to-live duration indicative of a duration for which thesecond adjacent node 3115 can assume a monitoring relationship with thefirst adjacent node 3110 is active, and an act of the first adjacentnode 3110 sending a second establish grant 3123 (e.g. in step 3 of statediagram 3100) for the second establishment request, the second establishgrant indicative of the first adjacent node 3110 monitoring the secondadjacent node 3115.

The present invention may be embodied in other specific forms withoutdeparting from its spirit or essential characteristics. The describedembodiments are to be considered in all respects only as illustrativeand not restrictive. The scope of the invention is, therefore, indicatedby the appended claims rather than by the foregoing description. Allchanges which come within the meaning and range of equivalency of theclaims are to be embraced within their scope.

What is claimed is:
 1. In a federation infrastructure of a ring of nodesconfigured for bi-directional routing, the ring of nodes including aleaving node, a first adjacent node adjacent to the leaving node on thering, and a second adjacent node adjacent to the leaving node on thering, a method for maintaining ring consistency when the leaving nodeleaves the ring of nodes, the method comprising: receiving, at the firstadjacent node, an indication from the leaving node indicating theleaving node's intent to leave the ring of nodes; sending, by the firstadjacent node, a first indication to the second adjacent node that:indicates acceptance of the leaving node's intent to leave id-spaceownership for a portion of the id-space between the leaving node and thefirst adjacent node; and indicates establishment of a one-way monitoringrelationship between the first adjacent node and the second adjacentnode; and receiving, at the first adjacent node, a second indicationfrom the second adjacent node that: indicates acceptance of the firstadjacent node's intent to assume id-space ownership for the portion ofthe id-space between the leaving node and the first adjacent node; andindicates establishment of the one-way monitoring relationship betweenthe second adjacent node and the first adjacent node.
 2. The method ofclaim 1, wherein the first adjacent node receiving the indication fromthe leaving node indicating the leaving node's intent to leave the ringof nodes comprises: receiving, at the first adjacent node, a departuremessage from the leaving node, the departure message including anownership range of node identifiers indicated as being owned by theleaving node.
 3. The method of claim 1, wherein the first adjacent nodesending the first indication to the second adjacent node comprises:sending, by the first adjacent node, an establishment request to thesecond adjacent node to establish an ownership range between the firstadjacent node and the second adjacent node; and receiving, by the firstadjacent node, an establish grant for the establishment request, theestablish grant indicative of the second adjacent node monitoring thefirst adjacent node.
 4. The method of claim 3, wherein the establishmentrequest includes a time-to-live duration, time-to-live durationindicative of a duration for which the first adjacent node can assume amonitoring relationship with the second adjacent node is active.
 5. Themethod of claim 1, wherein the first adjacent node receiving the secondindication from the second adjacent node comprises: receiving, at thefirst adjacent node, an establishment request from the second adjacentnode to establish an ownership range between the first adjacent node andthe second adjacent node; and sending, by the first adjacent node, anestablish grant for the establishment request, the establish grantindicative of the first adjacent node monitoring the second adjacentnode.
 6. The method of claim 5, the establishment request including atime-to-live duration, the time-to-live duration indicative of aduration for which the second adjacent node can assume a monitoringrelationship with the first adjacent node is active.
 7. The method ofclaim 1, wherein one of both of the first adjacent node or the secondadjacent node are immediately adjacent to the leaving node on the ring.8. The method of claim 1, wherein: the first adjacent node sending thefirst indication to the second adjacent node comprises: sending, by thefirst adjacent node, a first establishment request to the secondadjacent node to establish an ownership range between the first adjacentnode and the second adjacent node; and receiving, by the first adjacentnode, a first establish grant for the establishment request, the firstestablish grant indicative of the second adjacent node monitoring thefirst adjacent node, and the first adjacent node receiving the secondindication from the second adjacent node comprises: receiving, at thefirst adjacent node, a second establishment request from the secondadjacent node to establish an ownership range between the first adjacentnode and the second adjacent node; and sending, by the first adjacentnode, a second establish grant for the establishment request, the secondestablish grant indicative of the first adjacent node monitoring thesecond adjacent node.
 9. The method of claim 8, wherein the firstestablishment request includes a first time-to-live duration, firsttime-to-live duration indicative of a duration for which the firstadjacent node can assume a monitoring relationship with the secondadjacent node is active, and wherein the second establishment requestincludes a second time-to-live duration, the second time-to-liveduration indicative of a duration for which the second adjacent node canassume a monitoring relationship with the first adjacent node is active.10. A computer system, comprising: one or more processors; and one ormore computer-readable media having stored thereon computer-executableinstructions that are structured such that, when executed at the one ormore processors, the computer-executable instructions configure thecomputer system as a first adjacent node in a federation infrastructureof a ring of nodes configured for bi-directional routing, the ring ofnodes including i) a leaving node, ii) the first adjacent node, and iii)a second adjacent node, the first and second adjacent nodes beingadjacent to the leaving node on the ring, the first adjacent node beingconfigured to perform at least the following: receive an indication fromthe leaving node indicating the leaving node's intent to leave the ringof nodes; send a first indication to the second adjacent node that:indicates acceptance of the leaving node's intent to leave id-spaceownership for a portion of the id-space between the leaving node and thefirst adjacent node; and indicates establishment of a one-way monitoringrelationship between the first adjacent node and the second adjacentnode; and receive a second indication from the second adjacent nodethat: indicates acceptance of the first adjacent node's intent to assumeid-space ownership for the portion of the id-space between the leavingnode and the first adjacent node; and indicates establishment of theone-way monitoring relationship between the second adjacent node and thefirst adjacent node.
 11. The computer system of claim 10, whereinreceiving the indication from the leaving node indicating the leavingnode's intent to leave the ring of nodes comprises: receiving adeparture message from the leaving node, the departure message includingan ownership range of node identifiers indicated as being owned by theleaving node.
 12. The computer system of claim 10, wherein sending thefirst indication to the second adjacent node comprises: sending anestablishment request to the second adjacent node to establish anownership range between the first adjacent node and the second adjacentnode; and receiving an establish grant for the establishment request,the establish grant indicative of the second adjacent node monitoringthe first adjacent node.
 13. The computer system of claim 12, whereinthe establishment request includes a time-to-live duration, time-to-liveduration indicative of a duration for which the first adjacent node canassume a monitoring relationship with the second adjacent node isactive.
 14. The computer system of claim 10, wherein receiving thesecond indication from the second adjacent node comprises: receiving anestablishment request from the second adjacent node to establish anownership range between the first adjacent node and the second adjacentnode; and sending an establish grant for the establishment request, theestablish grant indicative of the first adjacent node monitoring thesecond adjacent node.
 15. The computer system of claim 14, theestablishment request including a time-to-live duration, thetime-to-live duration indicative of a duration for which the secondadjacent node can assume a monitoring relationship with the firstadjacent node is active.
 16. The computer system of claim 10, whereinone of both of the first adjacent node or the second adjacent node areimmediately adjacent to the leaving node on the ring.
 17. The computersystem of claim 10, wherein: sending the first indication to the secondadjacent node comprises: sending a first establishment request to thesecond adjacent node to establish an ownership range between the firstadjacent node and the second adjacent node; and receiving a firstestablish grant for the establishment request, the first establish grantindicative of the second adjacent node monitoring the first adjacentnode, and receiving the second indication from the second adjacent nodecomprises: receiving a second establishment request from the secondadjacent node to establish an ownership range between the first adjacentnode and the second adjacent node; and sending a second establish grantfor the establishment request, the second establish grant indicative ofthe first adjacent node monitoring the second adjacent node.
 18. Thecomputer system of claim 17, wherein the first establishment requestincludes a first time-to-live duration, first time-to-live durationindicative of a duration for which the first adjacent node can assume amonitoring relationship with the second adjacent node is active, andwherein the second establishment request includes a second time-to-liveduration, the second time-to-live duration indicative of a duration forwhich the second adjacent node can assume a monitoring relationship withthe first adjacent node is active.
 19. A computer program productcomprising one or more computer-readable media having stored thereoncomputer-executable instructions that are structured such that, whenexecuted at one or more processors, the computer-executable instructionsconfigure a first adjacent node in a federation infrastructure of a ringof nodes configured for bi-directional routing, the ring of nodesincluding i) a leaving node, ii) the first adjacent node, and iii) asecond adjacent node, the first and second adjacent nodes being adjacentto the leaving node on the ring, the first adjacent node beingconfigured to perform at least the following: receive an indication fromthe leaving node indicating the leaving node's intent to leave the ringof nodes; send a first indication to the second adjacent node that:indicates acceptance of the leaving node's intent to leave id-spaceownership for a portion of the id-space between the leaving node and thefirst adjacent node; and indicates establishment of a one-way monitoringrelationship between the first adjacent node and the second adjacentnode; and receive a second indication from the second adjacent nodethat: indicates acceptance of the first adjacent node's intent to assumeid-space ownership for the portion of the id-space between the leavingnode and the first adjacent node; and indicates establishment of theone-way monitoring relationship between the second adjacent node and thefirst adjacent node.
 20. The computer program product of claim 19,wherein: sending the first indication to the second adjacent nodecomprises: sending a first establishment request to the second adjacentnode to establish an ownership range between the first adjacent node andthe second adjacent node, the first establishment request including afirst time-to-live duration, first time-to-live duration indicative of aduration for which the first adjacent node can assume a monitoringrelationship with the second adjacent node is active; and receiving afirst establish grant for the establishment request, the first establishgrant indicative of the second adjacent node monitoring the firstadjacent node, and receiving the second indication from the secondadjacent node comprises: receiving a second establishment request fromthe second adjacent node to establish an ownership range between thefirst adjacent node and the second adjacent node, the secondestablishment request including a second time-to-live duration, thesecond time-to-live duration indicative of a duration for which thesecond adjacent node can assume a monitoring relationship with the firstadjacent node is active; and sending a second establish grant for theestablishment request, the second establish grant indicative of thefirst adjacent node monitoring the second adjacent node.